#61 AI is Officially Smarter Than Humans: First Look at OpenAI O1 'Strawberry' Artwork

DataTopics Unplugged: All Things Data, AI & Tech

Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.

Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

All Episodes

DataTopics Unplugged: All Things Data, AI & Tech

#61 AI is Officially Smarter Than Humans: First Look at OpenAI O1 'Strawberry'

September 19, 2024 • DataTopics

Send us a text

Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. DataTopics Unplugged is your go-to spot for laid-back banter about the latest in tech, AI, and coding.

In this episode, Jonas joins us with fresh takes on AI smarts, sneaky coding tips, and a spicy CI debate:

OpenAI's GPT-01 ("Strawberry"): The team explores OpenAI’s newest model, its advanced reasoning capabilities, and potential biases in benchmarksbased on training methods. For a deeper dive, check out the Awesome-LLM-Strawberry project.
AI hits 120 IQ: Yep, AI is now officially smarter than most of us. With an IQ of 120, AI is now officially smarter than most humans. We discuss the implications for AI's future role in decision-making and society.
Greppability FTW: Ever struggled to find that one line of code? Greppability is the secret weapon you didn’t know you needed. Bart introduces greppability—a key metric for how easy it is to find code in large projects, and why it matters more than you think.
Pre-commit hooks: Yay or nay? Is pre-commit the best tool for Continuous Integration, or are there better ways to streamline code quality checks? The team dives into the pros and cons and shares their own experiences.

Speaker 1: 0:02

You have taste in a way that's meaningful to software people. Hello, I'm Bill Gates.

Speaker 2: 0:13

I would recommend TypeScript. Yeah, it writes a lot of code for me and usually it's slightly wrong. I'm reminded, incidentally, of Rust here, rust.

Speaker 3: 0:25

This almost makes me happy that I didn't become a supermodel.

Speaker 2: 0:28

Cooper and Netties.

Speaker 1: 0:31

Well, I'm sorry guys, I don't know what's going on.

Speaker 3: 0:34

Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here.

Speaker 1: 0:40

Rust Data Topics. Welcome to the Data Topics. Welcome to the Data Topics podcast.

Speaker 3: 0:47

Hello and welcome to Data Topics Unplugged, your casual corner of the web where we discuss what's new in data every week, from strawberries to the speed of light, uv no too much, anyways, anything goes. We're live streaming on LinkedIn, youtube, not X anymore, but we're still on Twitch, so feel free to check us out there. Feel free to leave a comment question and we'll try to address. Today is the 17th of September of 2024. So, after a long break, we're back. Do we have that? We're back? No break, um, we're back. Do we have that? We're back? No, anyways, we're back. Uh, my name is morello. I'll be hosting today, joined by my side sidekick. I'm not sure if you can call your sidekick part, but bart, nonetheless psychic. Okay, bart, and the return of jonas. Can we get an applause for that? Hello, hello. Hey, jonas, how are you great?

Speaker 3: 1:54

great great, yeah, uh, maybe for the people that missed the first time. So this is your second appearance, if I'm not mistaken, correct? Um, you came back, so that's always good for the people that miss, maybe the first one. You want to give a quick intro about who you are, what you do fun facts so, uh, I'm jonas.

Speaker 1: 2:13

I'm a machine learning engineer here at data roots. I did a phd before at k11 and now I made the jump here. I'm in the industry doing cool AI, ml and MLOps things.

Speaker 3: 2:27

Cool, cool, cool cool. Happy to have you back again. Indeed, indeed, indeed, we have a very, very hot topic, so no one better than Jonas to give us the deets, the hot topic being why were you off for so long? That was not what I was thinking, but maybe why have we been away for so?

Speaker 3: 2:49

long but show it to the people people that are listening he's uh holding up a big golden ring well, you say like that, like a gangster, you know, go chains, go to you know. Yeah, I have gotten married, so I was uh away for a while, but uh, congratulations thank you, thank you, thank you, thank you.

Speaker 3: 3:10

Thanks, alex. So, yeah, I was a bit away. Actually, um, the wedding was in portugal as well, so to go there, to come back, I had some remote work and whatnot. We made it work. Uh, also did the paris. Uh, olympics, maria did my partner, my wife did the the yeah, I used to need to say yeah, I know. So there was a lot of cool stuff as well. We went with our dogs. They were the ring bearers as well. It was a very nice. Uh, yeah, everything was perfect. So very happy. Um was a lot of driving as well, going with the car to go with the dogs. So now we're back, still adjusting a bit back to the routine, you know waking up and this and that, but very happy to be back. So if I forget some things, you know to click change the screen for free. Let me know.

Speaker 1: 3:57

It's been a while.

Speaker 3: 3:58

You're rusty. I'm a bit rusty. That is it, that is it, but so. That is it, that is it, but so. A lot of stuff happened while we were gone. I think at the end as well, we can give a little update as well on data topics and plug going forward right With the deep dives and whatnot. A lot of stuff happened. There's a lot to talk about, but maybe the freshest news right out of the oven is Can you have a? Do you want to take a guess, jonas?

Speaker 3: 4:24

Yeah, it's uh the strawberry yes, what is the strawberry open the ice strawberry? Yes, what is that?

Speaker 1: 4:32

so they released the new model, or at least a preview of a new model. They call it, uh, chachipiti 01. So they, yeah, reset their naming scheme because they say this is such a big advancement that we start from one again. And so what they say is, the advancement is that it's a reasoning model, or that's how they call it, so it's supposed to be able to reason better than ChatGPT 4.0, their previous model, and with that come big improvements in certain domains. For example, they have now benchmarks on mathematical reasoning, physics-based reasoning and a few others where they show very, very big improvements over previous models and models of competitors so the so o1 is a change right.

Speaker 3: 5:20

So I think they also before it was called gpt, which stands for generally, generally or pre-trained transformers.

Speaker 1: 5:27

No I'm not sure what the g stands for but it is about the trend.

Speaker 3: 5:33

It refers to the transformer architecture. So the idea I mean from what I understood as well is that, oh one, they're referring to a different type of right. It's not the same model that is trained on new data or more scale. It's like it's a different architecture. What is different about it? What's maybe the gist?

Speaker 2: 5:53

It's performance maybe to start with yes, yes, it's performing way better on a certain set of tests. So they tested it on competition, competition, competition math tests, competition code test, psd level science questions and it's uh clearly outperforming uh gpt 4-0, which I think but still is a bit considered the factor standard gp240, was it?

Speaker 3: 6:20

it wasn't the best performing, though right. The 4.0 was like the more the smaller.

Speaker 2: 6:27

Yet performing the multi-model one. There were some tests where the 4.0 was a little bit better, yeah, but it was more or less comparable. I see, I see, I see, but like the difference here as we were discussing, I mean it's a significant uplift from what it was.

Speaker 3: 6:47

Actually, I read quite a bit about it, but I realized now that I didn't really read the the open eyes announcement, so I'm just looking here for the first time. Um, so they also have this uh chain of thought thing right that now is built in. Uh, maybe someone wants to take a crack at what it is.

Speaker 1: 7:04

Yeah, I think this is what they at least. This is one of the things we do know about the model, because of course, they're a bit hesitant on sharing technical details, but they do mention chain of thought. So now, when you type a prompt into yeah, as you're used to in ChatGPT, but you select this preview model, you won't get a response immediately. You'll get like a small prompt, a small notification that says ah, I'm thinking, and then it will give some vague indication of what it's thinking about and only afterwards it will start generating your response.

Speaker 1: 7:37

And this is something we discussed in the previous podcast where I was. It's basically chain of thought, so the model doesn't immediately start generating its answer. It first generates additional words that are only visible to the model, and in these words it's actually thinking about things. So it might try to kind of, if it's a complex problem, it will first try to figure out simpler steps and then it will find solutions for these simpler steps. And this all happens within that hidden thinking, these thinking tokens, and then, once the model has thought enough between brackets then it will start generating response and it can use all of this thinking that it has generated before to generate its response.

Speaker 3: 8:25

Yeah, I think you explained it really well. The one thing for me and I think this is something that some people already voiced internally in our uh slack, that is it is this something really really new or is this like, because, like you mentioned, we talked about about this in the last podcast, so this is not something that the first time we see it. So would you consider this something a novelty, let's say, or would you say that they're like, what's new about it?

Speaker 1: 8:55

Opinions are very different of different people. Some people will say that there's nothing new here. This is just chain of thought. We've seen this before. They just scaled it up. That's one opinion. That's out there. I believe there are some new things. Maybe not in the principle. It's still a stochastic parrot, as people say, but the way it's trained is definitely probably a bit different than previous models, or at least the way it's fine-tuned is probably different. And then there's, of course, a lot of speculation about what the model actually does. There's also speculation about how the response is being generated, that this is not the same as before what do we know about how it's being trained?

Speaker 2: 9:42

just this new, because what I understand, it's not like theoretically you can do this with a system on top where you have a first output and then you, you question that output again to to basically initiate a bit of a chain of thought. But here this is really encapsulated in the model, from what I understand. Right like yeah, what is? What do we know about how this was trained?

Speaker 1: 10:03

um. So I've read a lot on this, or at least all the links, most of the links in slack and of course there's a lot of speculation here so it's difficult to actually know what is speculation and what we actually know. So they hinted towards this chain of thought and they also hinted towards some data efficient way of fine-t tuning the model to do this chain of thoughts itself. So it's not a prompt. As we did, as you could do before in chat gpt, you could also say first think a bit and then generate response.

Speaker 3: 10:34

It's not the same as that, because they really fine-tuned the model with a reinforcement learning procedure to already do this from its own so I think it's like, well, from if I understand what you're saying because even entropic, right, remember we talked about the chain of thought as well like the reasoning for that entropic model was showing that it was trying to just deceive the user to give good feedback so they would get the good responses. And from what I understand here is just like doing the training they are, the train of thought process is going on during the training, I guess, and before it felt more like it was a prompting strategy, but now this is something that is really built in the model. It's like they're really doing all these things, maybe just real quick for people that are following on the live stream. This is just an example, I guess, right, so this is GPT-4.0 and this is the O1 preview. So guess, right, so this is gpt 4.0 and this is the o1 preview. So actually this is a preview. Still, right, I think the o1 mini is available, but the but now it's just preview.

Speaker 3: 11:30

Another practicality is that it's not available for everyone. I think you need to have a certain account with a certain threshold which you spent this much money opening. I I don't think anyone that has an open eye account, even the paid one, I think can just use it, I think. But basically there's the same prompt, the same question on both left and right now for people that are just listening.

Speaker 3: 11:53

And then on the left, with 4.0, it just already starts with the let's break it down step by step. But then it gives basically the answer right. And then on the right side, where you see the 0.1 preview, they have a chain of thoughts that is collapsed so you don't see what it is, and then it gives the answer there and I'm not going to go through the whole text, but I'm assuming that what's on the right is better and then, if you click on the chain of thought, it actually says oh, what's going on here? So basically it's almost like I guess this is almost like what a developer called rubber ducking right, like you just kind of say it out loud and just kind of put stuff, and then they kind of go iterate until they reach the response um, which yeah there's a nice way to put it yeah, that's, that's yeah, I tried it out a bit, uh as well.

Speaker 2: 12:38

Um and uh, my experience and I think there's also like what you hear in the community is like it's way better at very complex tasks, including coding tasks, like if you say, give me the boilerplate for this type of project, like enough complex logic, where it's a much bigger chance that, like your first iteration, will give you something that works with 01 versus gpt 4.0 where you would have to ask follow-up questions.

Speaker 2: 13:12

So yeah, but that doesn't seem correct or and it feels a little bit like these follow-up questions that you would normally have to ask to come to a good answer that it now in the chain of thought process does that for you by expanding on your prompt, looking at it from different angles and probably doing that better than you would do yourself as a first, as a follow-up question, um, but also, I think also for a lot of simple questions like the performance is very like, very similar yeah, and I think it's also well.

Speaker 3: 13:44

Another article that I read from simon simon willinson, I think he also was asking, for example actually I'll just put this on the screen, I think it's from here yeah, he also asked on twitter, and I'll make this a bit bigger for example, of prompts that failed on chat 54 but worked on preview, right, um. So one I thought was very well, clever a bit, not, but introspective, right. So the prompt is how many words are in your response by this prompt to this prompt? And uh, yeah, oh, uh, for oh, wasn't able to count the words ahead of time, but the, the oh one was right, which I guess it makes sense when you think of you prompt something and then you count and then, if not right, you prompt it again, you know. So I think that to me makes sense.

Speaker 3: 14:31

Also, explaining jokes so two cows are standing in the middle of a field. One cow asked another what do you think about the mad cow disease that's going on around? The other? One says who cares? I'm a helicopter, um, and the o1 explanation actually made more sense. So I think and even then he also mentioned that the OpenAI researcher, so he gives the name Jason Wei. He said, it's also hard even for a researcher to really quantify these things, really find examples and really try to understand. I think we're more in an abstract land here, so it's really hard to very concretely say this is better for these kinds ofely, say this is better for these kinds of things and this is better for those kinds of things.

Speaker 2: 15:07

But there is a general consensus that and I think it's also a bit, when you get into this more of a tree structure, like the answer could be this or this or that, uh, and maybe we want to deep dive a bit like because of this, the first one seems more, most viable. Let's deep dive into that and then you branch out and these type of exercises seem that O1 is really.

Speaker 3: 15:29

And when you say tree, it's almost like you have different possibilities and you want to explore them and see what comes out and then to go back and maybe, which makes I mean intuitively, I would agree right Like if you have a chain of thought and you actually put stuff out. It's almost like you're running each simulation and then you see what actually came out of it.

Speaker 2: 15:49

Difficult thing is we don't really know, because it shows a chain of thought, but it's a summary.

Speaker 1: 15:54

Basically that is being created by a token AI, it's not the actual chain of thought that is behind it.

Speaker 3: 15:59

So that's another thing that I read as well, that the chain of thought it's not like you still use tokens. So actually I also read that there are two types of tokens. Now they call, I think, reasoning tokens and they're called, and then the actual output tokens I think what yours was mentioned, these are the thinking tokens.

Speaker 3: 16:16

Yeah, thinking tokens exactly sounds very smart sounds right, yeah, but um, yeah, what he was like. I read again I didn't try myself that the a lot of the things that are in the chain of thought is not shown, which there are some criticisms to it, but they mentioned Zaya because they didn't want to. I think they mentioned competitive advantage. So basically they want their secret sauce to be out there, but also like for to make sure that it compiles. It complies to the code of conduct, the regulations and all these things. So not sure how I feel about it yet I think, if you're really trying to build something, that you really try to understand why the model is choosing A or B or C, it would be very beneficial. But yeah, I also think it's easier to not allow it and then, in a year from now, say, okay, we'll make it available, than to do the opposite as well.

Speaker 1: 17:08

And so I think if I were in their position, I would probably do the same thing. There were also some interesting observations that in this chain of thought, it's still something that a lamp just generates and it's not guaranteed that the response that is being generated is based on what was reasoned before so in some cases.

Speaker 1: 17:23

Um, there was, there's, a very extensive they call it a model card. I didn't read it myself, um, but there they noted that sometimes it seemed that in the response it didn't actually follow some of the reasoning that it had produced before I see.

Speaker 3: 17:38

So it's basically like yeah, because the way I understand as well is that that it outputs something on the conversation and that becomes context for the next answer, right, yeah, which is in a way kind of the same thing as what RAG is doing. Right, you kind of add text in the context of the conversation but the LLM can still hallucinate, right, like you can still not use all the information you've put there and just kind of go sideways or do whatever, right? So yeah, it's another unknown, I guess. Another thing that I remember reading again from the Simon Williamson blog post as well Interesting tip from the API documentation Limit additional context in retrieval augmented generation, so RAG applications right. So basically say when providing additional context and retrieval augmented generation, so rag applications, right. So basically say, when providing additional context or documents, include only the most relevant to prevent the model from over complicating its response.

Speaker 2: 18:34

So this is specifically for the one model yes, I believe so.

Speaker 3: 18:40

So I think this is going from the api documentation. So I think this is going from the API documentation. Can I suggest locating?

Speaker 2: 18:47

Yeah, that's interesting because with Rack you typically say I'm going to around the topic. You have a database with documents Around this topic. This person wants to know something, so I'm going to inject the documents that are close to that.

Speaker 3: 18:59

Yeah.

Speaker 2: 19:02

And where you would normally say okay, we know that. Like, let's just take the top 10 documents.

Speaker 3: 19:05

Yeah. Not worry too much about being too precise, but here they're saying like the precise are the better, because otherwise you might consider the other ones more than you should.

Speaker 3: 19:17

Indeed, which I also thought was an interesting thing. So maybe from this article as well, I'll put it on the show notes as well. I read it. I already forgot a lot of the stuff. But he also mentions a lot of stuff from the API documentation. So a lot of these hints, right, because we don't know exactly what happened or what they did exactly down under the hood. Let's say, but there's a lot of tips there, so cool stuff. There's a lot of tips there, so cool stuff.

Speaker 3: 19:44

The other thing that I also saw, from a YouTube video actually that they made the analysis of the training compute and the inference compute right. So, oh, one, because of the chain of thought, it takes a lot more time to generate a response, right, and it takes more compute time. I think in a way they're how do you say, they're correlated, right, if it takes more time, it's probably because it's taking more time to compute stuff. Um, and then they thought one thing that was remarkable about this model is that we're showing that you can compensate inference time compute and overcome the performance of models that use compute only during the training time. Right, and actually on that video they were kind of saying like, oh, this is such a good thing, right, because we're going to decrease the compute cost.

Speaker 1: 20:36

I'm not sure about that, though it's like, basically you do an investment, you spend more time at training compute, yeah, but that's fine because lots of people are going to use it for inference. If you use it for inference enough, then you only need less compute for inference. Or you spend less time during pre-training, during the training phase, and more time during inference.

Speaker 3: 20:59

Like there's a bit of a indeed and I think the what's hard for me is like a lot of people use ChargePT, I'm sure. So I would think at first that like no, I don't think like switching the compute to the inference is probably a bad deal. You're probably spending more time in compute there overall, but then, on the other hand, I have no idea how long it took to train the other models.

Speaker 2: 21:26

Yeah, but in general this is a bit of a remark If you're ignoring the train versus inference, this will probably take more compute, right?

Speaker 3: 21:37

For the O1. The O1 versus GP4. Well, overall, I don't know right If all three of us are using. All things equal, we expect.

Speaker 2: 21:46

I mean, that's probably what I would guess but I think it's a bit short-sighted in a sense, like I think if you would use a one today for all your problems, like you you're, you're wasting energy basically you're wasting compute. But I think for the to make a good comparison on complex tasks, you need to compare it. How quickly do you get to an answer on O1 versus GPT-4.0? Because with GPT-4.0, you're going to ask a lot of follow-up questions. So I think that is the only way to honestly compare the actual inference time.

Speaker 3: 22:22

That's true and I also think that there are some tests that for sure, o1 is not going to be suited for, right, like anything that you want fast response, o1 is probably. It's already out of the picture, right. If you're going to have GitHub co-pilot, so that's the coding assistant, right. But every time you want to get that out of complete today it takes like six seconds or whatever. Then not sure right how how fit it is for that, true, yeah, so I think. Even like that by itself, my opinion is that one is not going to replace the gpt models.

Speaker 1: 22:52

They are going to coexist, right and I think we might go towards uh like one model that kind of just knows like. After this prompt, I don't need any thinking, I can just straight fire an answer. And for this prompt I don't need any thinking, I can just straight fire an answer. And for this more complex prompt, I'm going to trigger this reasoning procedure. And I'm going to start reasoning about it and spend a bit more time on computer.

Speaker 3: 23:11

That makes sense. Yeah, there was a project as well, I think we I don't know if we covered this before. It's called, I think, llm Router.

Speaker 2: 23:28

I don't know if it was exactly for that, I think it was more for like simple prompts you give a very cheap lm and for the more complex thing you can take the yeah, we did, we touched upon.

Speaker 3: 23:31

I think we touched upon this right and I think, yeah, maybe it will be. Yeah, now, models are not a machine learning model, now it's a system, right and yeah, in the end it's like but it was probably merged together gpt 401.

Speaker 3: 23:45

you heard it here first, and they're gonna have like chappity 501 and then five or six, oh two, you know when they? Yeah, it's gonna be a, it's gonna be a mess, and the router I think from that project I remember the lm router was an lm as well yeah, so it's lms all the way down.

Speaker 2: 24:03

Huh, it's the future, man yay very cool, very cool.

Speaker 3: 24:12

So you tried the o1 bar to yourself. I tried a bit. Yeah anything. Uh, surprised you after reading the stuff, like anything that was it as good as you thought? Was it not as good, was it?

Speaker 2: 24:27

this is very subjective what I'm gonna say. Yeah, yeah, but I uh, it was a very specific thing I was doing yesterday, um, with a range of dates I was using gpt 4.0, a range of dates and per date, a description.

Speaker 2: 24:43

We can ignore that. But there were and it shouldn't have been there. There were duplicates in the dates, so there were overlapping ranges. It shouldn't be there. Like, based on the prompt, it shouldn't be there. And with GPT-4-0, I then prompted ah yeah, but there are duplicates, please remove them. And then did it remove them? It did not remove it, it did not remove them. Then did it remove them? It did not remove, it did not remove and I have the feeling but it's very like I said, this is very subjective. I'm gonna say this was not the case six months ago. I have the feeling that the performance was better because I've done this before, very similar exercise, and I then did that in 01 and it was that worked so, do you think they?

Speaker 3: 25:26

no, it's very conspiratory. You know, like they, they bumped down the 4.0 performance.

Speaker 2: 25:30

Well, I probably not specifically for this, but I think that that discussion, as those rumors have been there before that GPT-4 was decreased a bit in performance to free of compute I mean that's a bit the rumors that are going around yeah, I do sometimes with the feeling, but it's super hard to to compare because I also, like you always, like for this exercise, I was doing now like I've done exactly the same thing a few times before, so I know I can compare it a little bit I really have the feeling that it was better.

Speaker 2: 25:57

But I, you also have this every time that something new is is uh released. You also have this wow factor. I think the wow factor doesn't really hold up in terms of six months later.

Speaker 3: 26:10

How you remember it, like I was actually right and I think also the like, if you're using chad gpt like the actual ui, there's the whole like temperature and all these things right so the responses are not really deterministic. There's some randomness to it and it is possible that if you ask the same thing five times, that one of them goes off. So it's a bit tricky. It's a bit tricky to say these things, but is O1 your default after this?

Speaker 2: 26:38

So actually the O1. It's also because it's not the default. If you open ChetGPT, I see and it doesn't support tools yet, so it doesn't support tools.

Speaker 3: 26:48

Yeah, and it's more expensive. No, the O1?.

Speaker 1: 26:53

I think it was a lot more expensive.

Speaker 3: 26:54

Yeah right.

Speaker 1: 26:55

You also pay for the top tokens. I thought that was interesting. Yeah, because you have no control over these, you don't see them.

Speaker 2: 27:09

But you can scope them? I think no, you can say I only as much, I max this much. I didn't put a window, I didn't know. But you're talking now about the api, right?

Speaker 3: 27:12

because in the chpd ui yeah, that's true, it's a fixed monthly, I think the api. I think, uh, because I mean I didn't use it, I'm going off of what stuff I read I thought I guess it's the api you know cool?

Speaker 1: 27:22

yeah, there's some other interesting things, for example these benchmarks that they show yes or, of course, ones where it's very, very structured reasoning. Yeah, um what?

Speaker 3: 27:32

do you mean by structured reasoning? Yeah, so these benchmarks you mean?

Speaker 1: 27:34

yeah, yeah, it's it's math, for example, where it's very clear like this is a good step of reasoning this is a bad step of reasoning.

Speaker 1: 27:42

Um, and so, of course, there's again some, yeah, rumors about how they train this. A while ago, they released a paper where they say they have a data set with 800 000 annotated reasoning steps. So they have a data set where each with examples of reasoning steps, where each reasoning step is scored like this is a good reasoning step and probably they also have some bad reasoning steps as well, and the idea is probably that they use this in some way to train this. But of course, yeah, whatever was in that training step, in that training set, these type of reasoning things is something that the model might have picked up on, while other things it might not have picked up on.

Speaker 3: 28:24

What do you mean by that less by this? There's some things the model picked up and some things it didn't.

Speaker 1: 28:29

Yeah, so this mathematical reasoning it's like a very yeah, how should I say? Mathematical reasoning is very rigorous, Like it's very structured. The same steps occur in different problems and if that was part of the training set, it has seen these, it has learned these type of reasoning, so it can apply this type of reasoning at this time. And some people have said like, yeah, okay, these benchmarks, but probably, yeah, you kind of trained it for these reading steps.

Speaker 1: 28:59

And this is why people say there's big differences between certain domains. Just because it has learned some type of reasoning, probably or should take care with what I say learn to do reasoning, but it has learned to generate things that look like this type of reasoning, while not others yeah, I see what you're saying.

Speaker 3: 29:17

So for example, for understand what you're saying correctly is like in the training set let's imagine there's a lot of math problems, so the model can say this is the training set.

Speaker 2: 29:26

There's even rumors that it comes from the math olympics right the submissions.

Speaker 3: 29:30

There's some rumors and so these type of questions yeah very close to the benchmark yeah yeah, yeah, I see what you're saying, so it's a bit inflated as well, like if you went even further something like maybe chemistry

Speaker 3: 29:40

which is on yeah, yeah, yeah, yeah, yeah, yeah. And there's the whole yeah. I mean it's still science, is still theory, is a bit more deterministic, right? I think once you go more to humanities, which is they're still reasoning and logic, I think it would be even even trickier right another point is it's very easy to verify these type of things, like for mod.

Speaker 1: 30:00

We know the correct answer. So also, again, rumors about how they would have trained this. Is they just let the llm generate some things? Some chain of thought and they just the chain of thoughts that give a good answer. They like feed days again into the llm, while the bad ones can be discarded and the llm can be told that this is wrong.

Speaker 2: 30:19

So you, because we already know the answer, you already have another mechanism to store uh to give feedback to the llm interesting very interesting maybe talking benchmarks yes there's another link which I think is an interesting thought exercise, but I don't know anything about the background this one uh, yeah, that one, this one.

Speaker 2: 30:49

It's a post by uh bindu ready, thank you, and I don't know anything about the specific test they did, but it's an iq test result that they, uh, um, apparently do, I think, on a weekly basis on different models, and I don't know whether it's a verified test or whatever, right um, but I think on average, most models still score quite high right 90, 90 points on iq test and, uh, one model apparently scores 120, which is way better, way better right. So maybe for people that are just listening, this is like a normal distribution.

Speaker 3: 31:26

O1 model apparently scores 120, which is Way better, way better, right? So maybe for people that are just listening, this is like a normal distribution-ish thing, with the XX being the average. Iq 100 is like the middle, and then you see icons with all the different models, typically to the left of the average, and then you only see O1 at the 120 mark Actually. So this is maybe I feel like I should know by this point in my life, but iq is like people say, like as a measurement of intelligence yeah and I I know there's like a test you can do for it, I guess, but what are the questions actually like?

Speaker 1: 32:04

I think it also. You have different types of tests. I also think.

Speaker 3: 32:07

I also heard of emotional IQ.

Speaker 2: 32:10

I also heard of different things, but like these are from a I've never done, I've never taken tests myself, but like typical, like logical questions, logic, yeah, logic, and they're typically fine-tuned to regions as well.

Speaker 3: 32:23

So they're typically, uh, fine-tuned to regions as well so they're so like the questions are localized to whatever. Yeah, like so if you're asking brazilians, like you have three lions and then two are running, you know and this and the police comes, so it's like stuff like that. I see I see next question.

Speaker 2: 32:43

But they're validated on the geographical level. I see, I see, yeah, but what does this mean? If this is true, like it's, uh, I don't know, I don't know what to think about it. Yeah, this is, of course, like very specific questions and probably questions that are very linked again to the, to the, to the question, to the on which it did benchmarks, right logic questions. There it a clear right or no wrong answer.

Speaker 3: 33:08

Yeah, I don't know. I think IQ for me is a bit, because for me it's like you're trying to quantify intelligence. I think here, even when someone says, with an IQ of 120, I'm officially more intelligent, right? So really saying like if it's 120, you have an IQ higher than most people, therefore you're more intelligent.

Speaker 1: 33:31

But even for me to say like what, what does? That mean for me it's also difficult to say, also without knowing any details of what's behind this.

Speaker 3: 33:36

I don't know. It's a cool, I mean it's an interesting, uh, cool idea also one thing I uh, I was coding, pair programming kind of with someone that, uh, it was more junior and used copilot quite a lot. Okay, it was a very interesting exercise because I can see how there was a lot of code. There's a lot of stuff. Code worked for some cases, like they had some.

Speaker 3: 34:03

You know they work for that. Um, I at one point I said, like, before I even looked at the code, I said let's make sure it passes all these linting rules. And it's like, ah, okay, and then, okay, they have documentation, all these things. But then I realized how my most of the times my feedback was like why are you doing this? This doesn't make sense. Why do you have that function? This doesn't make sense. Why are you doing this?

Speaker 3: 34:25

And I think it's like these AI models things. It makes it very easy to write code, to create stuff. But the question that like more the question of why you're doing this, you know, should you create a new function or should you modify this function that already exists? These things that AI doesn't help with. And then I realized that, like there were functions that were almost duplicated. You know, it's just that one is for has an input with a table, the other one is to read a csv path, and it's like well, why don't you just add an argument? You know, and I think maybe that's where the skills like or like that we're gonna have to move towards a bit. You know, it's like what kind of prompts you need to ask, but not how to write it in itself, or maybe like yeah, maybe how to write it is good to understand as well. But yeah, like, and when I think of these things, it's like, yeah, oh, one can probably write all the code that was there. But if someone did all these things, would I say, oh yeah, that person is super intelligent, not sure, right?

Speaker 3: 35:20

I think it's like there's a, there's a thing about designing or what things actually need to be there, or I don't know. You're writing something and it's like oh, now we need sparky. Okay, I'm gonna run the function. Okay. Oh, now I need csv. Okay, I'm gonna write a function for Sparky. Oh, now I need CSV. Okay, I'm going to write a new function for CSV. And then maybe you should maybe take a step back and think like, why am I doing all these things? Or maybe you have an edge case and then you write a function to catch that edge case, or more code to write that edge case and you have is wrong but it's also because today I'll use copilot is probably very much prompted as in I want.

Speaker 2: 35:54

Now I want to create this function for reading csvs, for example. That's how I use it today. But if you would have a bit more, let's say, if you would have the full code base in your, in your context, if you have a bit more reasoning and you ask this, maybe you should be challenged Like are you sure you want to create this function? Because I already see this in this file.

Speaker 2: 36:16

So that maybe we see in the future that Copilot can become a bit more of your, actually your pair program right, which it is not today. Today, it is your monkey that writes code for you.

Speaker 3: 36:28

Yeah, I also think. No, I completely agree. I, it's your monkey that writes code for you. Yeah, I also think. No, I completely agree. I completely agree. I also think that I'm a bit thrown off by a big face right there, the camera automatically zoomed in on Marilla.

Speaker 3: 36:38

Go figure, but so that for sure. I definitely think that, oh, wow, now I'm not even on the frame. I definitely think that that's a thing Like to question, like like do you need this? But even things like it's so easy to write code that there should be some pushback, you know, like, ah, um, I need, let's write a function to dynamically find, generate these dates and basically what you're saying is lms is still dumb.

Speaker 3: 37:11

It's. I guess it's like what do you call dumb? Right, it can do a lot of stuff, but like the critical questions, it doesn't ask, it doesn't think right Like, it doesn't like if you can, yeah, you can create a function to generate dates dynamically, but why would you do that?

Speaker 2: 37:33

Wouldn't it be better for someone to put them manually? But maybe that's if we would apply 01 to your coding assistant.

Speaker 1: 37:36

Maybe you would get this, maybe sure knows, but it take a while. Yeah, for me this whole discussion comes very close to what happens in chess. I don't know whether it has been discussed before, but in chess we had humans. They beat the computers. At some point.

Speaker 1: 37:47

Computers became better and better and better at playing chess, so the computers won. The computers became better and better and better at playing chess, so the computers won. The computers were better. And then there came this new hybrid where there were people that were actually going to use computers and the people were doing more planning, high level, but they still used the computers to actually think about very deep strategy. And at some point these hybrids so people using computers were actually better than the computer itself, and so it's again a bit so people that use those computers. There was not chess masters, so it was a completely different or a different skill set than, yeah, that the real chess masters, and it feels like we're going to something towards this. So we have the real hardcore programmers that can do it all on themselves, but then to actually being able to use the tool is again some other kind of skill set and you.

Speaker 1: 38:34

If you use the tool correctly, you might be able to do better than just only letting the tool write the code yeah, it's interesting.

Speaker 3: 38:40

Yeah, yeah, I did like it's interesting.

Speaker 1: 38:43

But like when you look at like that, it's like, yeah, it's a tool it's a tool right which requires a specific skill set, which requires to use effectively and it's a tool right.

Speaker 2: 38:49

Which requires a specific skill set, which requires specific skillset.

Speaker 3: 38:51

To use effectively and it's a very powerful tool, like it does a lot of stuff, but in the end of the day it's a tool, right? So yeah, I think, when you talk about intelligence, sometimes for me that's kind of what I think is like, yeah, it can do a lot, it can answer all, but what about asking the right question? I haven't seen that yet, right, and maybe that's I mean not saying we're not going to get there, like maybe we will, but I think this is a move towards that.

Speaker 3: 39:20

I think it moved towards that, indeed the introspection. And one other thing I remember I think you showed me this part, I don't know if we talked on the podcast that one you can actually test your, your LLMs is the self-reflection which basically you ask the model in the next, like so yes, something gives an answer and they can say was your answer good or was it just BS? And then a lot of times you could actually affect its own hallucinations, right, which I would imagine that with a chain of thought this happens way less, yeah, right, the chain of thought this happens way less, yeah, right, because it's almost like that's built in this building yeah, it's an interesting point that we didn't discuss, yes, but there's like another improvement that people are speculating that might be in there, and that is that it, the model, might not be just generating one stream of tokens, it might be.

Speaker 1: 40:04

This is speculation, so maybe a bit of context. It has been shown that if you take an llm and you generate multiple answers randomly, even on questions where you don't always get the same answer, the answer is in the llm somewhere so you can retrieve it. But because, yeah, it samples one word, you have to choose randomly one of the possible words. You sample the next word, but of course, whatever you picked as the first word will influence the options for the second word. So, yeah, we call this a probability distribution.

Speaker 1: 40:35

It's super, super complex, but by following the chain of thought or the token process, once you just generate one sample, one result, while the LLM actually defines a very, very big space of possible results. If you just would have picked the first three words different, your answer could have been completely different. So where it goes is that maybe, maybe in this O1 model there might be such a mechanism. So the easiest way such a mechanism could exist is just call the LLM four or five times. Look at all of these answers and pick the ones. That looks the best.

Speaker 3: 41:12

Or what if you did like a rag thing? You know, like you generate five things and then you say, from these answers, give me the best you know. Like it's a bit.

Speaker 1: 41:23

I just say stuff yeah.

Speaker 3: 41:25

I feel like I just say stuff but I have no idea if anything holds right. Like you can have all these ideas but then you try to code.

Speaker 3: 41:36

It's like, yeah, it doesn't work, yeah, that doesn't work. So, yeah, it's, uh, it's very interesting to read about. Yeah, I think there's a there's a lot of stuff and uh, I think I do think it's very refreshing that people are still thinking of even though I don't know if it's a new architecture right, there was a big discussion with the mamba stuff, you know but uh, I think it's still nice that there's still the people still thinking about it. You know they're not just saying let's just scale, let's just scale, let's just scale. There's. There's some thought going into it as well, no, they're really trying like.

Speaker 1: 42:02

Another fun quote is like it's still a stochastic parrot, but this stochastic parrot can fly higher than the previous one true and it's just because of these kind of new ideas and concepts that we have applied and, true, we it's still an lm we have not solved a lot of the fundamental problems there, but at least we figured out a step to get further on some domains. Right, still not clear on all the pains, and it's true, this little incremental steps that we might kind of give a way to completely new architecture that will solve maybe some of these fundamental problems.

Speaker 3: 42:36

True, true. So is there maybe? I don't know if you can wrap it up. I'm not sure if there's a lot more you would like to anything else?

Speaker 2: 42:46

No, but I agree with what Jonas is saying. I think the discussion is is there anything really new here? Yes, it's probably not at a fundamental level, but they've found a way to build a system around it, which also shows that today we have a performance on some specific set of tests that is way, way, way higher than anyone imagined a year ago.

Speaker 3: 43:10

So I think there is definitely progress Indeed, and that's what I also wanted to circle back, I think I asked like is this an innovation? Is this a new thing? I also think that, in reality, right, there's not like one person that just breaks the glass. Usually it's like very small steps. You know, at one point, for some reason, like even new networks in the 1900s, you know they're already there, right, but they weren't as popular. Uh, the bird stuff, you know, the gpt, even used from years ago, right.

Speaker 3: 43:41

So I feel like sometimes we have this perception that this is a huge breakthrough, and it is, but it's not like it's not the breakthrough in the sense that all these things are new. It's just that this was applied, sense that all these things are new, it's just that this was applied in a different way. Maybe there's a little thing that changed and this really kind of almost like a butterfly effect, you know, like that little change that really broke everything out. I do think it's very innovative, even if it is just applying old ideas, because actually, the one thing that I hear back again and again is the reinforcement learning and chain of thought. So the reinforcement learning, like being out the llm, the model, right, I'll put something and people kind of say this is good, this is not good, right? So I think it's human reinforcement, learning, I think, or something human feedback.

Speaker 3: 44:24

Yeah, that's what they used before yeah, which is what appears that they used here as well, with the chain of thought, as you explained. But I think, to do this at a scale, to do this like I definitely think it is, and I think they should, they should get credit for it yeah, and this is what I think is you see also cities in entropic research and these kind of.

Speaker 1: 44:45

They have big company, they have lots of resources and what you see is that they really add to the total body of research, not necessarily by coming up with new ideas, but like validating these ideas on a very large scale, like that your typical research group might not have.

Speaker 3: 45:02

True, true, so we all agree that it's a innovation I think so what do you think?

Speaker 2: 45:11

bart, it is definitely innovation incremental innovation.

Speaker 3: 45:16

Incremental innovation cool. Uh. I think we have time for at least one more topic. I see there's something from moody here. I have no idea what this is, bart, but it's great to click oh, on the topic, I thought that you would be interesting for you you thought, okay, curious, huh, you got my attention. Huh, let's see what is.

Speaker 3: 45:37

You typically bring these type of uh topics let's see the article here that I put on the screen for people following the live stream is grappability is an underrated code metric. What is this about? What is grabability, Bart? Do you know what grab is? It's the CLI, the bash thing, right.

Speaker 2: 45:57

Yeah, so usually you're maybe too young for it. So grab is where you say it's a CLI tool where you can say grab and then so wait, let me rephrase you have uh, a file and you want to grab that for a certain word.

Speaker 2: 46:19

So you want to basically search. Show me the part of the file where this word occurs yes, um, how easy is it to find a line of code? Basically, that's that's what the, the the title says like, uh and of code. Basically that's that's what the the title says like, and that metric of how easy is to find the line of code. Maybe we should think about it. Okay, if you call my attention, say more uh, and there they give some examples and they didn't go through it really in depth. Um, but like uh, don't make uh identifiers too dynamic. So let's say, if something is about shipping like they give an example of shipping address versus a billing address don't make the address type dynamic, but explicitly mention shipping and billing. Okay, so that you can also search for it, grab for it so you can find okay, we're doing something here with the shipping address.

Speaker 3: 47:19

Okay, like these type of things, um, make it easy to retrieve what you're doing where, but then it's really just for it's really to to find that line of code later, or literally yeah, it's just for that grabbing is finding a line of code yeah, I just thought that I was just wondering this is like, like the example we were giving.

Speaker 2: 47:34

We were, we were, we were discussing with, uh, the csv reader function. Yeah, like call it the csv reader function yeah like, don't call it this.

Speaker 3: 47:44

Read file function I see like it reads a csv but what if it's not? What if it's just a what can be filed with different extensions? Would it be better to have a underscore CSV function and underscore parquet function, or would it be better to have?

Speaker 2: 48:00

one. Well, this is a good example, probably if you would combine one function or two functions, it doesn't really matter for this purpose. But if you support both CSV and parquet, make that explicit in your code. Like, if it's a dot csv, then do this, if it's a dot parquet, then do that. I see, I see, I see, and it takes a little bit of the magic out of your code, but and but it makes it very easy to, makes it maybe a bit, a little bit more uh, extensive in, uh, the amount of lines of code that you write, but it makes it much more easy to find yeah, I see what you're saying.

Speaker 3: 48:33

I think also I think I heard it from linus torwalds, the linux guy that he actually said the good code, like code, should be boring to to read yeah, you know, like you should just be like yeah, okay, yeah, yeah, okay.

Speaker 3: 48:44

You shouldn't be like whoa, what is this? Whoa, this is crazy. Oh, like it should really be like yeah, and I think it's something that I'm very curious to see if things work. So sometimes I want to try, but then sometimes I actually read back so, yeah, this is. I don't think people can like, I don't think I would understand this in a month I really like to write recursive functions.

Speaker 2: 49:03

Yeah, but it's not easy to understand, right? Yeah?

Speaker 3: 49:07

yeah, I know. So, I think I, I do. I definitely hear what you're saying. I do think it's something to to keep in mind, because I do think sometimes people try to be so smart that is like you're too smart and no one understands like not even you from a month from now. Yeah right, but at the same time.

Speaker 2: 49:22

At times they think oh yeah, but you built something really cool yeah, yeah, I do that.

Speaker 3: 49:26

I think it's like. Sometimes I do that, like I write something and I'm like I'm really happy with it. I was like this is crazy. I'm really using all the niche things of Python here and then I come back a month I'm like, or maybe I write a comment that is like bigger than the function you know that's really hard to explain.

Speaker 1: 49:44

At least you write a comment.

Speaker 3: 49:46

Yeah, that's true. But then I'm like, if I have this big comment, like maybe I'm not doing it right, you know, like if I have to explain it, like again, if a code is boring and it's almost just like reading English, right, if you need to explain it, then maybe it's too complicated, right? I don't know. I mean, sometimes it do need to be complicated, but I'm not sure. What do you think about?

Speaker 1: 50:06

this, yeah, yeah, I wish to challenge it maybe a bit, because some code practices like Java, object-oriented programming, they say explicitly do not do this because you do not want to define upfront that you'll be able to read the CSV, that you'll be able to read the parquet.

Speaker 3: 50:20

You should have like a base class or something.

Speaker 1: 50:22

Yeah, yeah, yeah. They kind of say you don't have to specify this, you solve this with subclassing, such that upfront you just have a file reader and later, if I want to add another file reader, I'll make a subclass of that parquet file reader and another file reader, csv file reader. But in the class that uses that file reader you have no clue what is being what kind of file readers are supported. But that's by design, because you want to keep it super flexible.

Speaker 3: 50:51

So I think even then, in that case there would still be another piece of the code that would specify later.

Speaker 2: 50:56

I don't think this would exclude this approach. Right, you can still be explicit in the implementation of your base reader class right. That's true.

Speaker 3: 51:06

I think it's interesting. I mean, I guess for me I never ran into a code that I had a bug and I'm like, ah, I can't find that line. I mean, maybe it happened actually like more in GitHub repos or something that I'm looking for stuff, a code base that you don't know.

Speaker 2: 51:21

That's what I typically encounter.

Speaker 3: 51:22

I think so. I think it's like, if there is a I don't know, I have an issue with a package and then I want to find where something is on GitHub because I don't want to pull it and do this and that or whatever. If I don't, I don't have it on my site packages and stuff then I think it is more relevant but never really thought of refactoring my code because of that.

Speaker 1: 51:48

No, but I think it's a bit in two directions, like if your code, if you write good quotes or explicit quotes that's easy to understand, it will probably be grappable, and if you optimize for grappability, it will probably also be quite readable.

Speaker 3: 52:02

There's a connection there. That's what I was thinking. I'm thinking, if that's true, because, for example, if you have I don't know, because there's a, for example, if you think of grappability and you think of dry, that don't repeat yourself. I think I saw it even here in this article at some point Looks nice and dry, but for maintenance. So it's like I feel like there are some things that kind of conflict a bit, because if you're more explicit and you're more at this, then maybe you're repeating yourself a bit and maybe doing that I'm not. I'm not saying that everything should be dry, right, I don't. There's no like golden rule, like you have to do this but um yeah, I'm not sure.

Speaker 3: 52:39

I don't think about that to see if that's true.

Speaker 3: 52:41

One thing that I have thought and I even thought of doing like a presentation about this, and this is kind of how I approach code, like my only and I'm maybe being a bit click baited right, Like the only thing I care about when I'm writing code what makes a good code or not is how much stuff you have to keep in your brain when you're reading it, and I think the last things you need to keep.

Speaker 3: 53:02

If you have to keep little things in your brain to understand what it like, to really understand what it does, then it's a good code and I think it's like it's a thing that is vague enough that you don't make any absolute truths, but I do think it helps me to think about this right, Like so, if you have something that is a bit dynamic and you don't repeat yourself, so it's less, you don't have to keep track of two things at one time, less things in your brain great. But if it's something that is so complex that you have to go and look for other stuff other places, then then I think it's too much yeah that's, and I think that's.

Speaker 3: 53:35

I mean, yeah, you can talk about dry, you can talk even the zen of python like explicit is better than implicit. I think it all kind of boils down to this. That's the only thing you need to remember. Just try to make your code so you have to keep the least amount of things in your brain.

Speaker 1: 53:48

It relates a bit to modularity of the code, right, like you want to have, like independent pieces, and if you're in such a piece you only have to think about that piece.

Speaker 3: 53:57

Indeed, but I think again for me like this is for me.

Speaker 1: 54:00

I translate all of that to.

Speaker 3: 54:03

If your code is good, it keeps less things on your brain.

Speaker 2: 54:05

I think there's a nice summary yeah.

Speaker 3: 54:08

Nice goal. I'm thinking of doing a presentation about this too, like how? Because, yeah, like there's always context and some stuff. Context you need to be. That's why I think PyDentic, for example, is really good, because basically you're parsing JSON and dictionaries. You need to know the keys, you need to know this.

Speaker 3: 54:34

But if you're using pidentic, you have type hints, which is a context that you probably already have because you're a python developer, right? So you have to keep less things in your brain. You don't have like specify like c types or whatever, right, or cyton types and all these things. So the less things. And maybe you need to learn pidentic, which is true, right, so it is something you need to keep in your brain. But if you can reduce that as much as possible, that's how you make your code better.

Speaker 3: 54:49

That's the premise of the talk. So that's why global variables I don't think they're good, unless it's something like pi, which is a concept everyone knows, because it's again context that you should already have. But if you just have a variable like FOO, like all caps, foo in the beginning of your file and then you see this popping up in the middle of your function and you're like wait, what is this? And you have to go back, something else you need to keep in your brain, you know. So I have a lot of arguments and I think when I was in that experience that I was reviewing someone else's code, I was trying to motivate why I think that this is better than that, and I think I kind of to this like keep less things in your brain.

Speaker 1: 55:28

That's interesting. Yeah, that's cool.

Speaker 3: 55:29

Yeah, so using it in that way, so I'm planning to, I'm, I think I'm gonna. What do you think you think I should uh elaborate? You should do it. Okay, I'll do it. Roots confound, that's good, I'll try to uh try to cover something, I feel like every time I tell bart something is like I signed it, you know, because I know that tomorrow's gonna be like oh, how's the talk going?

Speaker 2: 55:51

it's a very interesting one and I think it goes beyond this like you, you have, you have tons of best practices, but why do you want to do? It is because you want to keep those things in your brain like and a lot of. Why do you use a formative? Because you don't want want to make it a concern.

Speaker 3: 56:05

Exactly why in I think in a lot of these principles, I've seen people saying like oh yeah, but it needs to be dry. It's like okay, but now I'm making this function so dynamic and so this like, no like, and I think there's always a little star next to the principles. But I do think that keeping less things in your brain I haven't seen exceptions yet. I think this for me it's been a very good guiding metric so far.

Speaker 2: 56:29

So you're gonna do a presentation on this and then afterwards a book.

Speaker 3: 56:33

Yes, nothing in my brain yes, be like on my tombstone, you know, but I do think it even goes beyond python, right? Is it programming? Yeah, exactly, I agree, right, so like yeah, I can go on and on, like pandas, there's like five different ways to do the same thing, it's not good, but it doesn't necessarily match with best practices in all languages, yeah, for example.

Speaker 2: 57:05

So in go there is. There is a movement to say you need to use very short variable names for some stuff like or even like two letter variable names. It's not very explicit. Say you need to use very short variable names for some stuff like or even like two letter variable names. It's not very explicit, right, you need to keep it in your brain like they're true, but do you agree with that? Well, that's something else but what I'm?

Speaker 2: 57:22

trying to say is like not that does not necessarily match with all best practices, but I think it's a fair overarching.

Speaker 3: 57:30

But I think if I, if we had a conversation, and you say, like reviewing my go code, and you say why, like, instead of putting birth date, just put x, and I'll say no, I don't want to do that because it's more thing I need to remember, what would you say to that? Well, I, you would agree, personally okay personally say birthday, yeah, yeah, yeah, yeah, okay, cool, I'll uh, I'll work on it. I'll work on it, cool. I was a bit on the fence, but I think now I'll submit it for you, thanks maybe we can um do a hot take.

Speaker 3: 58:08

Let's do a hot take. Can you do the hot, hot, hot, alex oh, hot, hot, hot, hot hot hot, hot, hot hot, hot, this is when uh took a slice of pizza

Speaker 3: 58:23

just came out of the oven just to actually fun. Fun was a fun evening. It's a fun story. Um, when I was young, like really young, uh, my mom tells this story. I don't remember, but she said that. Like she said, don't touch the other stove, it's really hot. And when I was a kid, like I don't know if you've had milk or something you taste like you put in your mouth, right, so I put the tongue in the in the oven and it burned my tongue and I never did it again. That was, uh, I don't remember, but that is what it took for you not to put your target. Yes, actually, I was. I mean be surprised, but I was a very stubborn child. So, yeah, okay, so every time I hear the hot, hot, hot, I'll think of that now. Same for me. Now, cool, so we have. Well, you want to go with yours, bart, the one that you put here?

Speaker 2: 59:11

We can, we can, we can. You're the boss, but it's a bit of a sensitive one, I think, especially for you let's do it, let's see it. I think you are a major fan of pre-commit. I'm a fan. I will say pre-commit, I'm a fan. I will say pre-commit is very overrated. Maybe you need to explain a bit what pre-commit is.

Speaker 3: 59:36

So pre-commit, when you're working with Git, if you want to save quote-unquote changes, it's like making a commit, very simplistically speaking. So what you can do, you can have these pre-commit hooks, which a hook is basically a piece of code that runs before something happens. So in this case it's before a commit. So whenever you say so I'm making analogies here, right? If I want to save a file, before saving that file run this program that checks if the file has any whitespace and if it does have white space, it won't let you save, it won't let you commit, right? So it can be things like that.

Speaker 3: 1:00:17

In practice, what happens is that the pre-commit hooks, they try to modify a file and if the files are modified, it won't commit. But then you can also see what changes were made, right? So if you have linters like black or rough and all these things, you can do that and, uh, basically at every commit it will format the file for you. So that's what do you think of this? Is this a good explanation? I don't know if then for you to place your hot take, I don't know if that's a good enough explanation. What do you think?

Speaker 2: 1:00:45

so it's a set of tools that you define, that you say these are things that, before doing the actual commit, you want to run. Yes, on your code base. Yes, and maybe you should also explain how these tools like like a format or like like how are they installed?

Speaker 3: 1:01:01

so you want the virtual environment stuff? Yeah, so, um, so there is a package called pre-commit that is written in python. The thing with python is that it's an interpreted language, right? So that means that if you want to, is a package called pre-commit that is written in Python. The thing with Python is that it's an interpreted language, right? So that means that if you want to install a package for your project, you have to make sure that it's isolated, right? If I want to install Pandas for a project that I'm doing now, but then in a month from now there's a new version of Pandas, I want to be able to have a different version for that other project, right?

Speaker 3: 1:01:32

Pre-commit packages packages. A lot of times they're written in python, which means that if you were to install in your project now that is a dependency of your project and maybe there are conflicts as well, right? So maybe, if I wanna install black, black depended on tomley, which is uh, was an external package. Now it's built into python and maybe I have tomley with a different version, so maybe there's a clash. So what pre-commit does is that for each hook, it creates a separate environment for that. Does that, uh?

Speaker 3: 1:01:59

yeah yeah, yeah, all right. I feel like, uh, I've been like how do you say like signing my death certificate you know you've been setting the stage. Yeah yeah for bart. It's like if I'm a volleyball player, I'm setting for him so I think it's nice as a tool right, okay, say okay this is a set of tools that I typically use.

Speaker 2: 1:02:21

Please run them for me and maybe even as a tool I can share with the team that is working on the code base okay but the fear I have is that it's becoming a bit of the standard for ci, which I don't think is a correct one, because you have this pre, these pre-commit tests, a lot of them. I think the source of truth, like, is this okay, should not be in the pre-commit but should be in get actions, or whatever your ci environment is, how would you so? And and if you say that, then your ci environment is how would you so? And if you say that, then pre-commit becomes a bit double accounting because you need to do this in your pre-commit, you need to do this in your ci, you need to keep these things up to date yeah, so I see what well?

Speaker 2: 1:03:05

so that's the. You need to make sure that you're doing the same test in your pre-commit that you're doing in your CI environment.

Speaker 3: 1:03:12

Yeah, so I think that's Well, and your argument is that the source of truth, let's say, should be in the CI and not in the pre-commit not in this random package that you're using locally. Yeah, yeah, I see your point. What I do today, and what I'll probably continue doing, is to run. So pre-commit is a package and you can actually run that in your CI. You can do pre-commit to run all hooks.

Speaker 2: 1:03:41

You can as long as, but this is actually why I'm bringing this up. It is super hard to do that when you're using a, when you're testing in your CI, and I'm talking hit-up actions now just to make it very.

Speaker 1: 1:03:55

When you're using hit-up actions.

Speaker 2: 1:03:56

You have a matrix environment and you want to run against multiple different Python versions Okay, and you're using RIE, then pre-commit doesn't work. You need to do a lot of custom stuff to get the right Python versions installed, etc. Hmm, interesting, doesn't work. You need to do a lot of custom stuff to get the right Python versions installed, etc. And it's very hard to debug that because you don't really know what's going on, because Precomot then creates its own virtual environment and everything is running in another virtual environment and you're not sure which Python version it's actually using.

Speaker 2: 1:04:27

So what you're saying is that if something goes wrong, it's hard, and then you get into the discussion like like, maybe you should not do a matrix, uh, python test in your hit of actions, maybe because you can also in your pre-commit run against multiple python versions. Yeah, you can say, you can specify as well, you can commit to only use 312 or multiple ones.

Speaker 3: 1:04:49

I've never used it, but it's possible.

Speaker 2: 1:04:51

But it makes it in my eyes, very, very intransparent. On what Python version am I actually testing again, but what?

Speaker 3: 1:05:00

I have done is that I have two CI jobs, one for linting and one for testing, and all these things.

Speaker 2: 1:05:07

Sure.

Speaker 3: 1:05:09

Why wouldn't you do that instead of running all the hooks with different python versions? Like, why do you need to run the hooks with multiple python versions?

Speaker 2: 1:05:22

and you're saying the linting is fine to do with one python version yeah, I think depends on the hooks you have.

Speaker 3: 1:05:28

So then again, then we get a bit specific to your project. But like, if it's like checking if there's white space, I think it's fine. If it's uh checking like black rules, like single quotes, double quotes, it's fine, I think, only if you get to the more. But even the things that like, even if you have uh like the typing, for example, now you have, I think, after three points something, but why make it?

Speaker 2: 1:05:50

something extra that needs to be in your mind Because you're setting up a Python version. He's using my words against me.

Speaker 1: 1:05:56

This is why I wouldn't know.

Speaker 2: 1:05:58

This is like you're saying okay, I have this Python package, I'm going to test this against Python 10. I'm going to run it against 3.10. Let's say something random, right? Okay, then I need to make sure that my pre-commits, I want to make sure that I test the same python version, because I don't want to have any side effects. Maybe something changed, even if it's just a format, or maybe something changed. So I need to specify that in my pre-commit, so that also needs to treat 3.10. Why make it an extra complexity?

Speaker 3: 1:06:24

but I guess for me it's like so, so you're right. So this is for packages. Right, because if you, because if you're testing across multiple Python versions, it's probably a package. Yeah, exactly, if it's an application, there's no need to test for multiple, just test for the environment that you're going to run in. Not necessarily Okay, but you're writing code and your local environment is going to be in a Python version. So 3.10, for example, yeah, like, if in 3.10 you don't have the pipe for union types, you're gonna write union, whatever, right, that's the, the file that's gonna be the same. So the pre-commit hooks, the linty and all these things should be run with 310 because you're, you're saying, you are explaining, like pre-commit it creates.

Speaker 2: 1:07:04

It creates a virtual environment per tool because, there might be a clash with your python package, which is very, very, very much an edge case, right, let's be very honest, very much an edge case. So what you're doing, even if you discuss locally, you have this Python virtual environment in which you run your project and next to that you have potentially tens of different other virtual environments that might be the same Python version or might not be, but might create size effects or might not be. Why create extra complexity?

Speaker 3: 1:07:31

Yeah, create size effects might not be not not do why create xflex? Yeah, I'm not sure if I agree first.

Speaker 2: 1:07:33

First, I'm not sure if I fully agree with the clashes, because because the alternative is just to install them as dev dependencies, right, and just have a, just have a script that that calls these you could do that, yeah right, run whatever you could do.

Speaker 3: 1:07:46

That. I think the thing is like for every dependency you have, yeah, yeah, but for every dependency you have, yeah, yeah, yeah, but for every dependency you have, then you have all the dependencies of that dependency as well. And I think if you have something like black or something that is C-based, it's fine, but then you're actually bringing a lot of other dependencies as well.

Speaker 2: 1:08:05

And then you could have clashes.

Speaker 3: 1:08:08

But this is a very far off, like potentially in the future there might be. I think I had issues with it. That's the thing. Like with data books I was. Is it worth the extra things that you need to keep in mind? But that's the thing. So well, I think then it's a trade-off of keep thinkings in mind, like if that's my guiding rule, because I think in one hand is like you have to remember that in ci, there you have two jobs. That's that's the way I would think of it. One is for linting the actual files. So anything that actually changes the files is one job and everything that is testing across multiple targets is another job. It has worked. I never had issues with this.

Speaker 3: 1:08:43

And then, yeah, if there are issues, then you have to remember all the context of virtual environments and isolated environments.

Speaker 3: 1:08:48

Otherwise, if you bring it to your dev dependencies, then you have to remember that these dependencies may also impact these other things.

Speaker 3: 1:08:53

So if you say there's a clash because of this and that you have to remember that you're using this dependency because you're running this linting hook and you're still using pre-commit to run your local dependencies, your local dev dependencies, there as well, like it's a different setup that you also have to keep in mind, right, because even the pre-commit package it's not set up for doing it that way, right? Like if you look at the pre-commit hooks that you have today and you run stuff locally, you have to say pass file names true or false, and you have to know that for PyTest you don't pass file names, and what does that mean. You have to know that for PyTest you don't pass file names, and what does that mean? You have to know what kind of file types you have. You have to know the difference between local and Python. I feel like maybe it is because the way the tool is set up, but I do think there's a lot of stuff you need to keep in mind if you're using it the way you're proposing.

Speaker 2: 1:09:49

What is the way I'm?

Speaker 3: 1:09:50

proposing so having everything as a dev dependency? Okay, well, maybe I don't know if that's the way you're proposing now.

Speaker 2: 1:09:58

The only thing I'm saying is pre-commit should not be the be-all and end-all of CI. I think CI should be separate. You should do it within your project context and maybe you create a virtual environment for a specific tool if you need it. But I think it creates a lot of extra complexity and a lot of hidden things that you're doing if the default in your ci is just to run your pre-commit I think that's the main point that it's hidden, that it's but hidden the sense that it has a virtual environment for each tool yeah, but if you don't know about how this tool works, you might just think I'm gonna run rough here.

Speaker 1: 1:10:40

Yeah, and just say, ah, because it's very easy. You copy paste something from the the rough uh, yeah whatever they propose, you copy paste it in your pre-commit and you put it in your ci, but you might not really know what's going on behind the scenes but then isn't like a trade-off, because when you doing it kind of yourself, because you have dev dependencies, let's take as a very concrete example You're running rough against your code base. Yes.

Speaker 2: 1:11:04

In a hit of actions.

Speaker 3: 1:11:05

Okay.

Speaker 2: 1:11:06

In a matrix test In your hit of actions you set a Python first for 3.9 and 3.10 and 3.11 and 3.12. I'm running my pre-commit there because you defined my pre-commit rough hook. You're thinking to yourself, oh, I'm using these versions, but you're actually not, because your pre-commit hook you specified it's 3.9 or whatever and it's always going to be 3.9.

Speaker 3: 1:11:26

So I'm not sure if that's a good example, because I think even rough, because rough is for changing the files, right. So again, maybe rough is going to say you shouldn't use union, you should use the pipe. But rough cannot say that, for because for the 312 you will have the pipe up the pipe operator for union types and for 3.8 you're not going to have it and you cannot change the file twice. You should have that like if you wrote it thinking of 310 you have to use the 310, you're gonna.

Speaker 2: 1:11:53

You're gonna assume. Unless you know about if it in your brain about pre-commit, you're gonna assume that you're running to assume. Unless you know about if it's in your brain about pre-commit, you're going to assume that you're running rough within the Python version you're testing at that point, which you're actually not, because there's a hidden version environment in which there is a specific Python version that you specified in your commit setup.

Speaker 3: 1:12:11

I'm not sure. I would think that, because to me it's like again, if you're changing rough for that target or this target, then for me I wouldn't want to do it for multiple environments. So that's why probably I wouldn't even put in the same job.

Speaker 2: 1:12:22

That's even more weird. Why would you not want a format or a type check or whatever to test that in the version that is actually relevant to you?

Speaker 3: 1:12:33

Because when you're writing something, you're writing with a certain like. I'm writing three eight code or whatever, right, like that's the thing, I'm writing on the thing, and there are some features that are available in three 10. They're not available in three eight Like. Tom Lee, for example, is a good example. Tom Lee, I think it was, it came into Python on three 11. So after three 11, I have to like one I can do import tomlib and on the other one I have to import tomly.

Speaker 3: 1:13:00

Right, there's a built-in package that after 3.11 it became part of the standard library. But when I'm writing the code, I have to write the code thinking of the Python version that I have and linting should also be in that vein. Right, there are some features that are available in 3.12 that are available in 3.12. They are not available in 3.11, 3.10, 3.8. What's the name of the type? No match case, whatever. That's also available in 3.11. So if you're writing something for T10, you cannot use that at all, unless you do import future whatever, because that's something that's built afterwards. But you know what I'm saying. Like, when you're writing code, you have to write thinking of a Python version and linting should also be in that Python version. And we're talking just about linting. Yeah, and I think there's a bigger argument. If you talk about if you run pytest on your pre-commit, there are other things you could do right, but for linting I don't. Pythes isn't even much more.

Speaker 2: 1:13:52

Pythes is much more, yeah, pyth is much more, yeah, python is much more. But, um, I think there are other things you could do and I think maybe it's a bit case by case, uh, for us to discuss, but I think even linting, like type hinting, for example yeah, like type hinting, is optional, right? Yes, um, you want to get hints based on, let's say, linting hints, tips from your linter, based on the version that the testing against, because there are a lot of things that are backwards compatible, right, but you want to make it up to speed with the latest version. So, the way I approach.

Speaker 3: 1:14:30

It is like you take the lowest version that you have and that's what you use.

Speaker 2: 1:14:34

But, for example, but capital l list even if we ignore all this, that there are multiple versions, if you would do it like this and you say I just say we test, we set up python 3.12 and get up actions. Yeah, you, I also have my pre-commit. There is a possibility that you didn't update your pre-commit to 3.12. That is true. I mean, let's keep it as simple as that. There's a. There's a hidden context.

Speaker 3: 1:14:55

That is true, that is true. But then I think it's like and again, I'm not saying that I think I agree that the pre-commit ci is not wow, this is perfect. I do think it's the best tool that we have. I think it's the one that makes my life the easiest, and maybe it is additional context, but I already have it in my brain for me, this is normal yeah I think what I'm trying to I was also trying a bit to make an argument here To me in my CI pipeline.

Speaker 2: 1:15:24

I want it to be very explicit. What is it that I value in my CI? And just a please run all my pre-commit tests, which is basically what you're saying is not explicit in the pipeline because you don't see in the pipeline at all what it's doing, because it's just running all your pre-commits. So it's not explicit in the CI. And there is also in your CI a hidden context because there are different virtual environments.

Speaker 2: 1:15:47

And I'm saying the CI needs to be explicit, so don't use pre-commit as a wrapper to do that. I see what you're saying but you can still use it as a tool locally right.

Speaker 3: 1:16:00

But like what? If I understand what you're saying is like you think it would be better. So people keep less things in their brain if they go to your ci and they read it and they know what's happening. And with pre-commit, now you have to go on the project, now you have to go pre-commit files. So there's more context that you need to know, which I I agree with the argument. On the other hand, for me, the reason why I started doing this is because I had to keep less things in my head, in the sense that I had something on my pre-commit hooks and I know it's gonna run this. Yeah, I know it's gonna run everywhere I don't have to make sure I have to.

Speaker 3: 1:16:27

It's multiple places, so in that sense, I keep less things in my head, and that's why I opted for that you are sure that it's consistent?

Speaker 1: 1:16:34

I'm sure it's consistent. I'm sure it's not gonna impact.

Speaker 3: 1:16:36

Like I'm not sure. I'm sure I'm not gonna have a weird numpy sub dependency there because I have this hook or something you know. Like it's not like I I have some not guarantees because things can always break, but it's like I have this there. I know it's not gonna mess up with the rest of my project. I know I can change versions very easily here. I change, change it here. It also goes on the CI.

Speaker 3: 1:16:58

For me it did keep less things in my head and also because the pre-commit package, the way it's set up, if you want to do something custom, it's not as easy. Like you have the fail, you have the pass file names, you have a local and a Python thing argument for the language, which to me even today it's not 100% clear, you know. So maybe if those things were different, maybe I'll have a different opinion as well. But I think today that I do agree. There is a bit of context you need to get and I think if you're in the CI, you need to know that the CI is using pre-commit and you need to know what pre-commit is. You need to know this.

Speaker 3: 1:17:32

But for me it's something that it has helped me, you know, because even the things like oh, you can do pipx, you can do pipx, install black, and you can run black there. But to me it's like, if I'm going to do that, why don't I just have it on the pre-commit and I have a virtual environment that they separate. But that's great, right, because I also saw all these other tools like oh yeah, because you have rough pipax this and do that, and it's like, oh, you can do brew, install black. You know, it's like, you know, uh, and it's like well, if I just have it there, it's just easier.

Speaker 1: 1:18:07

Just like, what I want to run is there. So you know. Yeah, when starting this podcast I was a bit worried that I would go too deep into 01, but we found a way.

Speaker 3: 1:18:15

We found a way, cool, interesting discussion. So I think, with that, I think it's time. It is maybe. What are you guys planning to do next weekend? How's your looking ahead? I'm not sure if it's too abrupt. I wanted to wrap it up last week. Next week it's still so far away. It is a bit far away now, but uh let me finish up.

Speaker 1: 1:18:45

Past weekend I read the phoenix project ah, you did on devops and I know devops, but it was still very, very interesting to read, just to see what the, the, the, yeah, the original thought behind it, what the original problem was, because we don't see it anymore like devops is usually there we don't see this big separation between dev and operations anymore, so it was very interesting to see, to read about a context that originally developed it, and it actually gave me, like a deeper understanding of all of these concepts and I think it's also well written, like I think he really captivates you, you know it's very action-based.

Speaker 3: 1:19:21

I also read the unicorn project. It's like the well, the guy wrote it afterwards, not as good, but it's still. Oh, it's, it's a follow-up, it's not a follow-up. The guy read it after, the guy wrote it after the phoenix project. Okay, uh, but the stories technically they are they Okay, but it's not. I thought the Phoenix Project was better, but still interesting, still a good read.

Speaker 1: 1:19:42

Yeah, and it's very Nesha. It's written for us kind of people, exactly For me. Yeah, but it's like when you read about all of these crises these fire moments you really feel it because you know how bad this is.

Speaker 3: 1:19:58

Well yeah, yeah, that's not for everybody.

Speaker 1: 1:19:59

Yeah, I couldn't give this book to my sister and say look, this is, this is great, you should read it cool.

Speaker 3: 1:20:04

So next week you're gonna read the unicorn project. I don't know. You should also read the goal. It's uh, the yeah, yeah yeah, the goal is, I think it's well, it's, it's about manufacturing. Yeah, but uh, the the guy even mentioned that the phoenix project was a bit inspired by this?

Speaker 1: 1:20:19

yeah, yeah, they mentioned it in the book yeah, so it's.

Speaker 3: 1:20:21

I read that one too. It's nice, it's nice cool, uh, what about you, bart?

Speaker 2: 1:20:26

next weekend?

Speaker 3: 1:20:27

yeah, I don't know yet nothing special planned okay uh, I don't know either what I'm gonna do, but I don't know. I wasn't fishing for you to ask me. I don't know what I'm gonna do. No, I, I wasn't fishing for you to ask me. I don't know what I'm going to do.

Speaker 2: 1:20:38

No, I'm trying not to have too much planned.

Speaker 3: 1:20:41

That's good. That's a good plan. That's a good plan to not have too much planned. What about you, Alex?

Speaker 3: 1:20:46

I also don't have too much planned, no idea, okay, okay, maybe one piece of exciting news before we wrap up. Not not for like for the podcast. I think we're looking into bringing some external guests. Yes, every once in a while. So you're still going to see the usual bandits, but every once in a while, maybe once a month, but to be seen. No commitments there yet to bring someone external do a deep dive, people from the community, from the data and AI community.

Speaker 2: 1:21:17

So product founder, yeah, yeah, so if anyone has?

Speaker 3: 1:21:19

any suggestions, thoughts, anything that anyone that you think would be a good fit. Feel free to send it our way. Um and uh. Yeah, to be seen. Let's see how it all plays out, but you can expect that from us going ahead, going forward looking forward to it yes, me too, me too, me, too, me too. So yeah, cool, anything else. Any last words?

Speaker 2: 1:21:44

say something inspiring. Yeah, maybe after the outro.

Speaker 3: 1:21:50

you know just um, but, yonas, thanks a lot for joining, thanks for giving us the context we don't have on the 01 and whatnot.

Speaker 1: 1:22:00

Thanks for inviting me again.

Speaker 3: 1:22:01

No, no, it was always a pleasure. You always come very well, very knowledgeable prepared, so we really appreciate it, because sometimes we can prepare ourselves a bit better, if I'm being honest. So that's it, thanks y'all, thanks y'all.

Speaker 1: 1:22:17

I'm asking for inspiration. Okay, hello it. Thanks y'all asking okay, oh I would.

Speaker 2: 1:22:26

I would recommend uh, what is it? Yeah, it writes you'll wait a lot of code for me and I got slightly wrong success is not failure is not fatal this almost makes me happy that I didn't become a supermodel. Success is not final. Failure is not fatal. This almost makes me happy that I didn't become a supermodel.

Speaker 3: 1:22:45

And then Jonas, can, he'll read it. No, I'll pick the best one. Ah, you'll pick the best one. So you're like the model that she's the. It's really not Very meta.

Speaker 1: 1:22:53

Rust Data topics. Welcome to the data topics podcast.

Speaker 3: 1:23:06

So, murilo, I'll go first, you, you generated an inspiring quote to end the podcast on using gpt 4.0, right? Well, actually it's gpt auto. I have no idea. Just open it, give me an inspiring quote. So I think it's the photo. No, it's. On the top left it says gpt auto no idea. So for me, god, success is not final, failure is not fatal. It is the courage to continue that counts.

Speaker 1: 1:23:28

Western churchill we should check whether it's actually yeah it even stole one.

Speaker 2: 1:23:33

Huh, it even stole one. I made something specific oh really.

Speaker 3: 1:23:36

But what was your prompt?

Speaker 2: 1:23:38

because maybe it was like if you were on a breed, the prompt was generate an inspiring quote to end the tech podcast on Ah, okay, so more specific, so this it actually says it taught for six seconds Okay.

Speaker 3: 1:23:50

Not a lot of thinking, eh.

Speaker 2: 1:23:51

The chain of thought is the three topics. Topics it uh, determine the goal, it evaluates how it could be used and, uh, the third topic is how to improve technology. I don't know, but the output is remember the technology we create today shapes the world we live in tomorrow. Keep innovating, stay curious and never stop pushing the boundaries of what's possible it's more inspiring sorry, I think we need an applause for that.

Speaker 3: 1:24:15

No, as for the, but I said please, so I think my my prompt was can you give me an inspiring quote please? All right, y'all all right, thank you, see you next time bye.

People on this episode

Bart Smeets

Host

Murilo Cunha

Host