DataTopics Unplugged: All Things Data, AI & Tech
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!
DataTopics Unplugged: All Things Data, AI & Tech
#44 Unpacking Open Source: A Look at GX, Monetization, Ruff's Controversy, the xz Hack & more
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's explore the complex intersections of data, unplugged style!
In this episode #44, titled "Unpacking Open Source: A Look at GX, Monetization, Ruff's Controversy, the xz Hack & more" we're thrilled to have Paolo Léonard joining us to unpack the latest in technology and open-source discussions. Get ready for an enlightening journey through innovation, challenges, and hot takes in the digital realm.
- GX Cloud Unveiled: Paolo gives his first impression on the latest cloud data quality tool: GX Cloud
- Open source monetization: Delving into the trade-offs between open-source projects, their managed counterparts and other strategies in making open source sustainable financially, with Astral, FastAPI, Prefect’s role in this space.
- The Open Source Controversy with Ruff: A discussion on the ethical considerations when open-source projects turn profit-focused, highlighted by Ruff.
- Addressing the xz Hack: Delving into the challenges highlighted by the XZ backdoor discovery and how the community responds to these security threats.
- Jumping on the Mojo Train?: A conversation on Mojo's decision to open source its standard library and its impact on the future of modular machine learning.
- Becoming 'Clout' Certified: Hot takes on the value and impact of clout certification in the tech industry. Read more.
You have taste in a way that's meaningful to software people. Hello, I'm Bill Gates. I would recommend TypeScript. Yeah, it writes a lot of code for me and usually it's slightly wrong. I'm reminded, incidentally, of Rust here, rust iPhone is made by a different company, and so you know you will not learn Rust while skydiving.
Speaker 2:Well, I'm sorry guys, I don't know what's going on.
Speaker 1:Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here.
Speaker 2:Rust Data Topics.
Speaker 1:Welcome to the Data Topics. Welcome to the Data Topics podcast. Hello, welcome to the Data Topics Podcast. Today is 4th of April of 2024. Today's a Thursday. Usually we live stream on Fridays, but for personal reasons we shifted the live stream a bit. We're also live on YouTube, linkedin, twitch X. That's it. Feel free to leave your comment or question there and I'm joined. I'm joined by Paulo. Once again, yes, and Alex behind the camera there. Hello, today we're not joined by Bart. Unfortunately, bart had other commitments and he said the show must go on. Show must go on. Yeah, actually, I heard rumors that he's in Portugal, bathing in Portugal, so, so, maybe next time he'll be back. He'll be like super 10 already, or maybe, uh, I heard that he's also in the netherlands back, aka the motherland for bart, not for me, so, uh, but maybe, well, when he comes back, we can, uh, we can see what he was, what he's been up to, maybe it can be a topic.
Speaker 2:Maybe it can be a topic it's interesting to discuss.
Speaker 1:People want to know yeah, we can gink up on him and just be like what we're doing, you know. Yeah, paolo, this is so. This is the topics.
Speaker 2:You want me to explain the concept for you no, but for me it's fine, but maybe for the person who joined two minutes ago no, but Paolo, you've been on the pod a couple of times, yeah, and you still kept coming back. I mean I guess you're better judgment.
Speaker 1:The people asked you're just too yeah, uh, any any life updates since last time you've been here?
Speaker 2:uh, life update. We had the skid trip, which was really really nice nice and uh, yeah, the screen keeps uh yeah, but it's okay, it's okay it's just for us right.
Speaker 1:People on the other side can, can, no but we had the ski trip.
Speaker 2:It was, uh, really nice really. I thought it was a bit maybe too short this time, uh, but I don't know isn't it always too short?
Speaker 1:yeah, it is. Are you? Are you a skier?
Speaker 2:do you ski a lot? Yeah, but this time I hurt my ankle, and so I skied the first day on Saturday and I did some jumps and I was like I don't know Too many.
Speaker 1:Yeah, it's not a good idea actually.
Speaker 2:So I just stayed at the hotel the rest of the journey I see. All right.
Speaker 1:I'm glad's your still one piece, you know, with us today. Uh, what do we have here? Maybe for a reminder for people that this is the first time I'm seeing you. Do you want to quickly introduce yourself? Yeah?
Speaker 2:hi everyone. Uh, I'm paolo, team lead data quality data management at data roots. Uh, so basically what I do is I lead any initiatives around data quality data management. You'll soon see a reel of me explaining a bit what I do in more details. Stay tuned, yeah alex any comments maybe on that. It will be available on youtube and linkedin. I think you need to, I think can you hear me?
Speaker 1:Yes, Press it again, yeah.
Speaker 2:Yeah, okay, I can hear you, yeah.
Speaker 1:It'll be available on YouTube and LinkedIn and Instagram.
Speaker 2:TikTok maybe Maybe.
Speaker 1:We'll see about that.
Speaker 2:You're getting too greedy huh, anything goes, that is true, anything does go.
Speaker 1:I'm talking about data quality. There is this thing called great expectation, or something.
Speaker 2:Yeah, yeah, I only heard about it last week. No, but yeah, indeed Great expectation, which was, and it still is, a big player in data quality tool released GX Cloud, I think, two months ago.
Speaker 1:So GX is great expectations, right. It's just shorthand for that.
Speaker 2:They rebranded themselves from ge to gx, uh, and they released finally, that's true they used to be ge yeah, exactly, and they rebranded to gx. What a change what a change wow I prefer it to gx but uh, you made you. You made that change. You asked I do the pull request and I forgot to accept it, you know pull request. We're gonna talk about that later, but yeah, so they finally released their cloud version. They were hinting towards this for a few years.
Speaker 2:I want to say because last time I spoke with their representative it was, I think, in 2022, 2021, maybe, and they showed us something very different from what we actually can see now in the cloud version. So yeah, just wanted to bring this on data topic.
Speaker 1:Nice, nice, so maybe Great Expectations is a data quality framework right.
Speaker 2:Yeah, it's a data quality tool. So what it does? It helps you find data quality in your data set, so it doesn't correct them for you. It doesn't, uh, apply changes based on certain rules, so it's a totally, let's say, corrective um, so it doesn't take any corrective action. It helps you find the issue and then you act on it based on what it finds for you. It's quite complete and that's maybe one of the bigger pain points I have with Great Expectations.
Speaker 1:It's that it's too much complete. How is this a pain point?
Speaker 2:It's too good, no, but I mean, sometimes you want something that works. And data quality typically it's not something that, it's something I noticed, it's not something you want to spend too much time on, so the easier the better, and actually it's the same with anything. You don't want to have like 10,000 of config files to set up like one data quality check. You want to have like I won't mention it otherwise Because, yeah, but like other data quality tool, you have like one file to define the connection, one file to define the data quality check you want to apply on the connection you just made, and that's it.
Speaker 1:Yeah, so basically you want a tool that stays out of your way. Yeah, exactly Right. Like if you know too much about it, it's probably not a good thing because you don't. You don't want it to be in your way, you just want it to be doing its work in the background. Yeah, it's pretty cool. And when you say it doesn't take corrective action, it's like maybe can you give a few examples on what kind of checks you can do? What do you mean by data quality? Yeah, exactly.
Speaker 2:So, basically, data quality check. It checks your data against a set of rules that you defined. Uh, typically, let's take the full pipeline. Business comes to you and says, okay, well, I have an issue with my data set. I'm expecting non-null values in this column. One data quality check you could then pass is okay, I accept no null values in this data set.
Speaker 1:So basically, if there are no values, there's a problem.
Speaker 2:There is a problem, and then you need to set up your process pipeline to say okay, I need to do this, this person needs to be notified, this dashboard needs to maybe be updated or put on hold.
Speaker 1:Something like this and I think for another example maybe if you have a bank and you have users, all the users need to be above 18 years old, so you can say all the values here and the age should be above this exactly, um, and a lot of data quality. I think we had a lot of.
Speaker 2:Like the first episode you're in, we did talk a lot about data quality, different dimensions of different checks exactly, we went a bit more into theory about data quality, because we we actually and that's's maybe I can mention this as well the first time I came to the podcast, I presented the article I wrote about data quality tool open source data quality tool. There's a new one coming, probably next week or the week after, where Nemish and Arthur helped me evaluate new tools and in this article we focused on dbt tools because dbt has it's just like a lot of people start using dbt and it has a lot of advantages and I think someone else can speak dbt up in place of me, but um what it has.
Speaker 2:It also has some tests that you can run, but it's very limited. But now there are plugins like Redata and Elementary that's two of the tools that we evaluated in the article that can extend the possibility of DBT. So theoretically you can have yeah, exactly, and it's really nice actually. So, yeah, and they also have a cloud version, by the way.
Speaker 1:So when you say cloud version, right For the people that are not super in this space like, what do you mean by a cloud version? Why would you use it? What's the benefit?
Speaker 2:Well, and yeah, that's something. Uh, let's what? Ah yeah Okay, uh, yeah, what, okay, uh yeah, um, and I will come back to this later uh, but the very uh big, the biggest advantages that I would see is storing your metric on cloud, so you just, and then whatever uh sas version.
Speaker 2:so, uh, can software as service version, service version of data quality tool can help you and they will bring a lot of new features that are not available in the open source, of course. But basically you could run your data quality check in a SaaS version, showing the result and assigning ownership, governance processes for notifications, stuff like that, and just not dealing with for example, in the example of great expectation, not dealing with the thousand files of config, for example.
Speaker 1:I see. So it's like there's also, like governance, the sense of if there is an issue, the tool provides a place for people to say, okay, this person is going to solve this issue. These are the corrective measures and also storing right. So yeah, if you're running this machine you need to put in place that everyone can see what it is. Yeah, correct, and there is also nice ui, I think from one and that's very important.
Speaker 2:For certain tools in the data engineering space, it's not that important. Uh, let's take model duck, for example. The ui is very bare bones but it works. Which one, uh, model duck? Yeah, it's very bones, but it works.
Speaker 2:Which one Motoduck. It's very bare bones but it works. But for data quality it's very linked. It's very much linked with business and business. Typically they don't like to go through files, json files, amp files. It's like and I understand them, it's like they're not used to work with that. It's something on top that they have to manage. So, having a nice UI explaining, okay, this is the check that you have, these are the results that you have as well, this is very important for them as well, and that they can click and say okay, if I set up this data quality check, if it fails, notify this person. If they can do that with a nice UI, I mean it's great.
Speaker 1:Yeah, indeed, I think also in one of the projects that I'm helping with, there's also this discussion with the cloud offering and the. The other alternative would be the open source one. Yeah, right, um, and I think you really hit the nail on the head there that the decision on whether they should go for the cloud or not really really hinders on whether business people are going to be interacting with it Right so what kind of?
Speaker 1:what are the people that are actually using this? What do you need to know? They don't need to know, et cetera, et cetera, et cetera. So yeah and uh. So great expectations. In a way, it's like a bit late to the party.
Speaker 2:Exactly. Yeah, did you try a great expectation? Yeah, I did some try with GX cloud and some connection with snowflake, because right now what they do is they only support snowflake and, I think, post gray connection. So on the cloud, on the cloud, okay, but you can still. I didn't try that, but theoretically you can still run your own GX checks on your own machine server, send the results so that you can see your checkpoints, your data quality check, on the cloud version. But then, coming back to the pain point that GX brings is not only with the definition of the config file, it's also like just what I mentioned checkpoints. They have like this very difficult, it feels difficult, I played with it and it is difficult. They have all these concepts that you need to implement before actually starting and monitoring your data quality. So you need to define your expectation, suit, your connection, then you need to find a checkpoint it's like what is the checkpoint?
Speaker 2:exactly. It's like a run of your expectation suit okay because it's been a while since. Uh, I've touched that, but it doesn't feel like you need to actually have a checkpoint. You could just run your expectations, you could have your result and then, okay, result for today, result for tomorrow.
Speaker 1:Yeah, I see, but then, on the other hand, though, by having to configure these things, you give a lot more control to the user, I guess. But then are you saying that, for data quality for this specific domain, you believe that the way to go would be to be a bit more opinionated and give more things off the box?
Speaker 2:It always depends, unlike Bart would say, it's a difficult one.
Speaker 1:Shout out to Bart if you're listening, he's going to leave a comment here on the live stream. Paul, you're fired, see you guys.
Speaker 2:Giving more control and granularity is sometimes good, but I feel like it's unnecessary here because I've implemented quality framings that I've never used checkpoints because, you have all the information you need in the results that great expectation gives you. So the timestamp and everything like this and the expectation shoots as well, so it feels a bit unnecessary I see.
Speaker 1:Yeah, I do. Yeah, Great expectation, I think is one of the most popular ones.
Speaker 2:I think it is the most popular one. The open source space actually.
Speaker 1:Because I see a lot of well. They were only open source for a long time, right, but I have seen a lot of other tools that use Great Expectations as data quality, like Hopsworks, I think, uses. Hopsworks is a feature store, but they also embed a bit of their quality, I think, with great expectations. I think I've seen some other ones that they use great expectations, so so yeah, yeah.
Speaker 2:But to come back to the, to the cloud version, it's nice but it's first iteration and when you see what other stores have out there being paid open source tool that become that had, like now the hosted version of, just like. When you think of other data quality tool like I've got the name monte carlo, for example, it's it feels a bit very early stage compared to what all the data quality tools are putting out there.
Speaker 1:But I think it's nice first step but then you feel like it's early stages. Why?
Speaker 2:because because, it's very bare bones, it's like there's not a lot of functionality exactly yeah, but it's nice first step and I I I will be looking forward to the next iterations. Uh, but I think it should be a bit more streamlined. And in the cloud version it's really well. They just put, like, their building blocks on a web page and that's it for now so that's why I say it's very early stage and do you know what's the priorities?
Speaker 1:of great expectations as a company do they want to invest heavily on the cloud or I mean, that's what brings money.
Speaker 2:So I guess yes, but uh, yeah, I don't know um maybe well, I don't know.
Speaker 1:I want to change the subject. That's why I'm looking at you but yeah, just one last note.
Speaker 2:Just I know I've been uh, it feels like I'm trash talking great expectation that I'm actually very grateful for the the work they did and they built a nice product, but I think it's getting a bit clunky.
Speaker 1:Yeah, and I also think for a lot of these things like great expectations. I don't know if it was one of the first ones, but, like we said, it's probably one of the one of the most popular ones, right, and I think like, same thing with pun, the same thing with a poetry I think, when you get a lot of popularity, it's very easy for people to look and exactly learn from those mistakes and kind of you know, criticize.
Speaker 2:So I think, yeah, I think what everything we say here, I think we need to, yeah, take everything with a grain of salt of course, yeah, and and I do think they did a nice product and, like I said, and the the next iteration of the cloud version, I think would be even better than what they have now.
Speaker 1:Cool. Looking forward to that. And you mentioned the Great Expectations Cloud. That's how they make money. How are they making? Because, again, this project has been going on for a while. Yeah Right, how long has there been a company behind it? And by company I mean people getting paid to do this full time.
Speaker 2:I want to say super conductive. So the company behind great expectation has been out for six years six I'm not exactly sure on the timeline, but I know at some point how they got money was that they were funded. I remember seeing some high number and everybody can really getting really excited because finally some open source tool got funding. Yeah.
Speaker 1:And the company was called Superconductive, and now they rebranded to Great Expectations.
Speaker 2:No, no, it's always like Superconductive yeah, so they basically have there is this company and they have this product, which is great expectation, and the company got funding to build good expectation, I see but the company was doing other things as well.
Speaker 1:No, no, the company was for this project. So, if I understand what you're saying, they had an open source project. They created, created a company and then people just basically invested in them to build this product out, but they didn't have a cash inflow cost like a product or something.
Speaker 2:It was just the open source project.
Speaker 1:Maybe do you, mel? Do you have any comments on that? Like, do you feel like this is a way to do open source? So people got excited when they first heard it, but do you think this is the solution to open source?
Speaker 2:that's uh, that's a difficult one now you know my boss I'm a pickle now uh yeah, I think for some tools it does make sense to at some point offer hosted services. Great expectation, it totally makes sense. I was super surprised that it didn't, that it took them so long. But again, I don't know what's happening behind the scene to actually get a product, because I think a lot of people are interested in having data quality but not being bothered by scheduling it, adding it to your data architecture, everything like that. Um, and let's maybe take another example duck db, but maybe not duck db, because that's a different example, because model deck is not the same people building duck db, uh. But let's maybe take Prefect.
Speaker 1:Prefect.
Speaker 2:Prefect. They built the open source version and then went on having a cloud version right.
Speaker 1:Or, yeah, like this one, right. So Prefect, for the people that do not know what it is, is basically a way to create workflows. Yeah. So you have basically say do this, then do that, then do that. That's what prefect does. It helps you kind of stitch things up, right?
Speaker 2:yeah, you're saying yeah, and so prefect I I don't want to say anything rubbish, or if someone watching this uh feel like I'm saying something wrong, just mention it but I think they started with a open source solution and then went on having this cloud version, which is really nice to use and for them as well it makes sense to have. Because orchestration looking at the past project that I've seen and I've worked with and people interacting with orchestration tool and implementing them, not actually implementing tool and like having implementing them and like not actually implementing the DAX so the, like you said, the steps, one after each other, but actually doing the infrastructure, yeah, seems like a pain in the ass yeah so having something that takes away this, this, uh, this idea of infrastructure and just say, okay, this is this tool, we help you set up the, the agent, so the, the person, the computer person that will trigger the step, one after each other, yeah, and you pay for it, I think this is just really nice yeah, I agree.
Speaker 1:I think I think other examples I mean you can think of also, I don't know, like, uh, airflow with astronomer, which, even though it's a bit different, because I don, airflow is a standalone project and Astronomer is a managed version of the software. But I think Airflow, I know a bit more of how it works underneath. But indeed it's not just the software, there's also a lot of different virtual machines. You have to need a database, you need to do this, you need to do that.
Speaker 1:I think same thing with MLflow, right, so it's an open source tool before you to deploy. You actually different components that work together, which makes it much I mean the, the actual overhead to build these things and to maintain them is much, much higher, right? So, and I think those things, that's when it there's an easy um, when quote, unquote there's an easy way that you can make money by saying I'll manage these things for you, you just use the software and I'll manage everything underneath it, right? But then I also see that on the other side there is um, and maybe, before we move on a bit to that, there's like projects that are open source and someone decides to do a managed version.
Speaker 2:Yeah.
Speaker 1:And open source and someone decides to do a managed version. Yeah, and there are people that from the get-go they say we're going to do a managed but then we're also going to have the open source part. Um. Terraform is another example, a bit controversial because they changed the license right but they also had a, even though that one and that's the other type, it's more of a tool, I feel right. So if you have a cli tool, if you have um like terraform, it's basically a package that you install that is also open source. But I feel like then there is less the argument of let's build a service around this.
Speaker 2:Yeah right, yeah, exactly a bit the same with uh astral and uh and uh rough, I think, and uh the other services that they provide.
Speaker 1:I mean, it's very strange that yeah, exactly yeah, so this is a well. Astro is the company and they basically are a company that provides python tooling, so they have rough and now they have uv yeah, so, yeah, we spoke about this, uh, two weeks ago, I, I think as well.
Speaker 2:But it's very strange because this is a tool, right, so you have it in your CICD. It's not really difficult to implement either. The infrastructure behind it is really simple because it's a rust, and then, yeah, yeah.
Speaker 1:I mean even, like I would say, the infrastructure, like the actual machines that run. This is just your computer. Yeah exactly Like. You don't need a server, you don't need this, you don't need that, right.
Speaker 2:So it's very strange to build a company based on this, and I don't know exactly what's the angle there.
Speaker 1:Yeah, I think there was also a comment here. They also put it on. How did they say? I need to find it? There was a blog article that they were discussing how they're starting a company and they got a lot of money actually from funding. So again, this is a bit similar to how Great Expectations started. I guess that they had an open source project. It was very popular, and then I guess they wanted to figure out a way to make money from it and then they actually got a lot of investors actually. So basically, people gave them money and then they can actually pay their bills, like their personal life bills, to work on this full time. But it's still a bit. Personally, I'm still one I still wonder how, how far you can take this, because I still feel like, at some point, if you like, if you have investors that put money in this, I still wonder if people are gonna wait for a return at some point I think so right um.
Speaker 1:Yeah, actually there was also some controversy about this. Out here we've raised four million in seed funding. Uh, the reason why also this was this particular story was there was a bit of controversy is because rough re-implements a lot of stuff that is done, right, um, and the other things that were done, they were all open source there was no ambition to start a company, right, um.
Speaker 1:So some people felt a bit demotivated to contribute to open source because they said, well, I, I'm putting this on my free time and now someone just basically takes all the ideas that I've put together, they rewrite so they still do some work right, but they're re-implementing now, and now they made a lot of money from this. So there's also that is this good for open source altogether? Are you going to be motivated to do this on your spare time? And you know there are people that are making money doing this.
Speaker 2:So it's a bit um yeah, and I totally get it, and for this it's more difficult. Yeah, yeah, it's it's a difficult one. But like I totally understand that and I wanted to ask the question like when? When should an open source contributor be paid? Because people contributing to Black, for example, which is something that Ruff is based on, I think yeah, should they get paid at some point, because everybody uses it.
Speaker 1:Yeah, I think if they should or they do I think are different things, right, I mean they don't. I think that's kind of the game of open source you know, you're not, you're putting it out there and the idea is that you're not guaranteeing, you're not responsible for what the software does. Yeah, exactly.
Speaker 1:But at the same time, I'm not expecting anything in return. Right, I just put it there for you. So, yeah, again, I I hear the arguments for both sides, but, uh, I also feel like this has been a bit more popular. This, this approach of starting a company to see if you can back up the yeah but sometimes it makes sense, but as well it doesn't really I mean, that's right now yeah, yeah, let's see what they come up with.
Speaker 2:I'm sure they have like some business model behind it that they presented to acquire four million funding what I've seen as well is the like consultancy so I think, for example, maybe duck tb is an example of that, maybe I'm not sure.
Speaker 1:So for the people, that's DuckDB. Maybe you want to give a quick explanation of what DuckDB is for people that never heard of it me.
Speaker 2:Well, I can try but so DuckDB is an all app processing engine, so basically it's online analytical, and the specificity of DuckDB is it works on one machine, and why DuckDB arose is because people saw that maybe you don't need Spark for everything that you do. So Spark is a processing engine that works on multiple machines, to put it simply, and that helps you deal with hundreds of gigabytes of data. Processing engine that works on multiple machines, to put it simply, and that helps you deal with hundreds of gigabytes of data. But what happens is that most of the data set you will use to do your analytics is sometimes only one month of data, and then you have like 10 gigabytes of data which can be handled on one machine and then reducing the cost and overhead that spark brings yeah, so yeah, it's very optimized for single machine exactly and I think also the idea is that today, with the cloud, it's very easy to get a bigger machine right.
Speaker 1:So I think that to be is an interesting story because there's the open source. They do have a company behind it. I want to say, um, I do think that the way that they get money is from consulting. I maybe need to be fact checked.
Speaker 1:It's a non-profit company and then from duck db that also rose the, the mother duck right, which you also already mentioned, which is basically a managed version right so, and it's a different company. It's a different company yeah, so they use the same technology underneath. But the idea with mother duck is that you don't run the stuff on your computer right, you run somewhere else. It can be a big machine, it can be a lot of data. You can join all these things there. Um yeah, another way, so another one. So fast api.
Speaker 2:Do you know, if you know fast api yeah, yeah, you told me fast ap, but I didn't see any mention of like SaaS.
Speaker 1:Yeah, so FastAPI for the people that do not know what it is is a Python framework for creating REST APIs. It uses like. It's very, very smooth to use. I think that's the winning point and it's fast, right? So you just put these decorators here, you have some functions and basically you have a rast api with documentation with a lot of stuff there based on the type hints. That's kind of the tldr, very tldr. Um, I was actually chatting with the creator, Tiangolo Ah it's his real name.
Speaker 1:No, no, no, his name is Sebastian Ramirez, but actually he's Colombian. I was actually like what the hell is Tiangolo? And then he said no, it's because Sebastian. And then my family says Tian, tian, tian, and then Tiangolo, at some point, really really nice guy. So, yeah, really really had a good time talking to him. And then he was saying like they, they, so they have some sponsors. This is another thing we could talk right like github also offers sponsorships. So basically, if you have an open source project, you want to donate 20, 20 euros a month, 20 a month, you can actually say that and just make it recurring, which helps, but a lot of the times, it's not enough for the people to work on this full-time. This is all this. Most of the times, it's not enough for the people to work on this full-time. This is all this, most of the times this is a hobby thing, right.
Speaker 1:But for fast api, what he was saying is he actually got a funding as well, from, I think I want to remember I don't want to butcher it, but I think it was like sequoia but he basically got a fellowship. So this, this is not really like investing. He doesn't have a company. I even asked him it's like what do you think about these things? Have you ever thought of starting a company with fast api? And then he told me like yeah, but fast api is a tool, I don't know like he has to be open source. I don't know I, I how can I mean it's not a service, right? Which it kind of goes back to the tooling versus service versus application that we mentioned earlier, right. So he did like, we did like, we did talk about it. But so the fellowship and I'm going to see if I can find it but the fast API.
Speaker 2:But it does make sense to fast API to have a SaaS offering.
Speaker 1:Like you mean like a so easy, easy deploy.
Speaker 2:Sorry, yeah.
Speaker 1:So easily spin up.
Speaker 2:Yeah, exactly.
Speaker 1:Yeah, yeah, maybe maybe you should start it All right. Yeah, yeah, maybe Maybe she started. All right, All right, do it. So this is I think this is the just Google it here First time I look at it no fellowships, how Sequoia is supporting open source, fast API Creators. The best for me is the first recipient of the Sequoia open source fellowship.
Speaker 2:Oh, nice.
Speaker 1:So this is really more of a Sequoia is basically funding his work for a year or something, right, which is google also did something similar for the python developers. Actually, the first, I think, full-time person python developer is lukas langa, the guy that created black. Yeah, I think yeah, and basically google said I even heard him on a podcast interview that google. He said I'm so grateful because I live in poland and google pay me like a legit, like silicon valley type salary to work on this right, so he was doing everything in the open. So I think every week he was doing like doing release notes or something of the work that he's been doing for python. So he seems like a very, very nice guy as well, like very, like ethical and like okay, everything should be in the open. I don't want to mess up this opportunity for other people as well. So there are a few cases like this, but I think we're far from this being the sustainable open source alternative.
Speaker 2:Exactly right, but it's interesting that, uh, some companies are taking it upon themselves to actually finance some. I guess why sequoia is financing this is because they're using FastAPI and then they see like values in FastAPI being maintained. I guess.
Speaker 1:Yeah, indeed, and I think there's a big argument there that how many companies benefit from Python. Benefit from, I mean, python, is a very easy example because it's a programming language, right, but like Pip, I heard that Pip the tool that you use to install python packages, the the main maintainer was. I think this was some years ago.
Speaker 1:He was like a master student doing something, something like it was something like crazy like that yeah, um, so there's a lot of people that put a lot of time, like their personal time, in these things right, and there's a lot of people that benefit, that makes a lot of money from these things as well. So there's always the argument of um, companies should be doing more to support back right, and I think then there's a lot of discussion on, even if we all agree, how would this look like?
Speaker 2:right, yeah, yeah. And how do you make sure that you pay fairly the person? Yeah, because, like the guy maintaining Python, the creator of like, he lives in Poland then receives like a Silicon Valley type of salary, like, if, like. That's the opposite. If you live in the Silicon Valley and then you maintain an open source package and, like some company in Belgium, pays you, salary would be very different.
Speaker 1:Yeah, it definitely would, definitely would. I'm just to share another. I think the official name is the Python Developer in Residence, so it's Lukasz Nanga. This is from an interview from a Talk Python podcast, really cool. So, yeah, I think Google basically sponsored the PSF, the Python Software Foundation, for this and indeed, right Like I think, yeah, even if you say yes, we agree, we should support how much. How should it look like? There's a lot of other questions here, but I think most people would agree that the way the open source is done today, it's not optimal at all it's a bit crazy that things are still working the way they're working to this
Speaker 2:day, yeah, exactly yeah, because a lot of the, the tool you use, and I just open source stuff and it's a lot yeah, you just wish that there is no malicious things on it. Right, I see what you're going there.
Speaker 1:You do wish that. And why are we mentioning this? This is a very timely piece of news. If you are in the tech world not even the data world in the tech world you may have bumped into it, but if you're not, maybe'll test with alex here. Um, maybe you're not aware of this, uh, but so what happened was there was a backdoor. Basically there was some uh, how do you say malware, I guess, like some. Yeah, that's how I would say. You know, I'm not an expert in these things, but basically there is an open source package called XZ that there was a new release. So basically there was a new version of this package that was on pre-release and everything was going fine. And then someone not the someone I want to name this person because Andres Freund, I don't know. I heard that.
Speaker 1:I saw somewhere that this word means friend in german oh really yeah, and alex is nodding yes, to confirm that, which is very fitting because he's been a friend of all of us exactly yeah, so basically, yeah to the world really.
Speaker 1:I mean.
Speaker 1:So, basically he I heard that he's a microsoft guy and he was doing some benchmarking on this new pre-release version and then he noticed that there was, uh, something weird with the cpu usage, basically, uh, and then he looked into it and then he realized that tarballs, so basically the, the zip files they're putting very crudely, they were infected, they were backdoored Backdoored meaning that people could actually run stuff on the computer that is running the software right and it, made like this, is making a lot of noise.
Speaker 1:Still, we still don't know who did it or who done it, because this is open source. So maybe to set the stage here, because this is open source, so maybe to to set the stage here. There was this project here, uh, xv, or lib, lib, lzma, uh, there they had basically one open source developer that was putting a lot of time on his spare time on this and, um, another user called gia 10, uh, or she, we don't know started contributing to these things. The guy, he was going to burnout basically because he was doing a lot of stuff and people were basically pushing. I think in open source too, we have this almost culture these days that you expect stuff right.
Speaker 2:Yeah, because the way you create issues, it's already called issues.
Speaker 1:Right, it's very yeah, but then it's like anyone can create issues, right. But the also thing is like there's an expectation that they should be fixed soon, right? Um, actually I actually wanted to find it as well, because they even show there, because this is all on GitHub, you can actually see the track, right. And they're saying like, hey, I opened this ratio a week ago, but no one said anything. Are you going to support this or not? And the guy says, oh, I'm so sorry, I'm going through like a difficult time. This is all my spare time. I'm almost going through burnout and this and this. And then the guy says, oh, I'm sorry that you're going through burnout, but really you should prioritize this, the community, this, this and this. So really like, maybe it's just a blog post, let's see. So this is a bit fresh. I'm not an expert in these things, but we will learn as we go here.
Speaker 1:So they are asking about XZ for Java. Is it still to maintain ice again? I asked a question here a week ago and I've never heard back. And then there's like yeah, there's a bug, blah, blah. So basically the guy they're just kind of having a discussion, right like, is this still fixed? This is not fixed. See, progress won't happen until there's a new maintainer condition, maintainer's lost interest. I don't care about it anymore. So they're pulling this and basically from this discussion and the guy is feeling guilt. And there's also some YouTube videos that some other open source maintainers addressed this. They relate a lot to this feeling, right, that you have a duty to the community and you're feeling burned out.
Speaker 1:And then this Gia 10 here kind of appears they had been contributing to the project as well, so it also feels like it was premeditated. It was like they were playing the really the long game. They found an open source project that had only one maintainer and they were actually contributing to it and then actually the the from what I understand, gia 10 becomes a maintainer. So here gia 10 may have a bigger role in the project in the future. He has been helping me a lot of the list and co-maintainer and it's practically a co-maintainer already. So that person actually becomes a co-maintainer.
Speaker 1:The person intentionally puts a backdoor. So there's like there's also some other issues like the log4j bug, but those things were also unintentional, right, it was a mistake. So this one is actually intentional. Someone actually went there in the code, put the back door. They also obfuscated it. The code is not on github actually, so you can only find it if you get the tarballs from the github releases as well. So there was a lot of effort to that were that was put to to make sure that they weren't found. Basically right, um, and then they released it and just by kind of luck, dumb luck, almost the person that was benchmarking this saw that there was a an issue there and they really looked into it and then they caught it right.
Speaker 2:Uh do we have any victims of this?
Speaker 1:I think maybe not, because I think this was a pre-release, okay, so, so that's why it was, in fact, everywhere. But if this wasn't just a pre-release, this was actually released to everyone, there could be in a lot of people's computers, but more than that, it could be in a lot of servers. So actually there was still. This was for Linux, gnu, x86 architecture, so there was still a bit of a. It was not like every computer in the world, but still right, like there's a lot, a lot of servers running linux, a lot of stuff. So, uh, and the analogy that I saw in the fire ship video, I thought it was pretty interesting.
Speaker 1:So the analogy is imagine that you're a landlord so you have a building. The building is kind of falling apart. Um, you do this on your spare time so you don't have time to make sure everything's flawless. And then this one person kind of comes and says, oh, I'll help you out. And they really start helping out. Everything started looking great, they had a lot of features, they do this, this and this and everything is great. And you say, okay, maybe you can co-own this building with me.
Speaker 1:And then one of the tenants, one of the the tenants of the building, they start noticing that their energy bills is higher than it should be. So they start breaking the walls and they look at the wires and they start seeing that there were actually cameras planted in the apartment, kind of like that, you know. So it's kind of like someone that was using it and realized there was something weird, looked into it and they realized there was a backdoor and they kind of flagged it and there was a lot of questions like okay, this person was on burnout. Uh, if the guy hadn't been more, if they weren't, if people weren't rude quote unquote maybe he wouldn't have stood his ground more, he wouldn't feel the need to really get help from whoever in the community. Um, there are a lot of different dimensions to this, like the social engineering, the, the technical, the actual engineering part. You know. That makes this a very interesting story to follow yeah, exactly right.
Speaker 1:And also, the maintainer of the xc project is also banned from. His account has been suspended on github. Oh really, yeah, yeah, and even the things like so there were two tarballs, one was signed by him, the other one was signed by gia 10 and even the the ones that he signed, they are not. There was no backdoor, so the guy has, like they shut down his github and all these things, but he's been trying to to to answer the questions, right, yeah, so, um, yeah, and again, that's the question. I think another fitting uh yeah, I was gonna this uh xkcd here.
Speaker 1:So for the people that are just listening, this is the famous xkcd dependency, I guess. Uh yeah, diagram image link picture basically has a whole bunch of blocks stacked on each other and then the blocks basically says all modern digital infrastructure. And then right below all those blocks, there's a very tiny thing here that then it points out and says a project some random person in nebraska has been thanklessly maintaining since 2003, which is kind of how this feels this is not kind of.
Speaker 1:It's actually this yeah, but the thing is like for, in this case, for the xc thing. What's more is, like the attacker identified this project, identified that the person was going through burnout. Yeah, they actually contributed to the project until they were a co-maintainer and then they uh do you think that the person did this purposefully? Yeah, yeah, like the way, like if you go through the actual code and what it does.
Speaker 2:Yeah, yeah, this, this, yeah, this, I believe, but like they didn't just stumble upon the package and then they were interested in and then saw some opportunity to, I don't know, do something malicious.
Speaker 1:Yeah, it's hard to tell.
Speaker 2:Did they really look, go to look on the on github and say, ah, well gonna. My plan, my 10 year plan, is actually finding 10 uh github repo where you have like some uh, one one man army, yeah, containing the package, and then it's hard to tell, right, like we don't know.
Speaker 1:I think there's even the discussion on is this uh, was this one person? Because there was uh, a fire ship mentions. I don't know if it's jokingly or not, but like what if this is a government organization? Right, like you don't know? Like we, we don't know. Like it's a GitHub profile.
Speaker 2:And did they contribute to anything else?
Speaker 1:We can take a look, let me, let's see. So you see, here there's some commits, some pinned Show, more activity, actually there's a lot of See oh it's this yeah, they have some stuff, but see XZ. But you see here the account. You should see that it's been XZ, java OSS, f, java OS, oss, fuzz. Yeah, I'm not sure, but it's harder. It's like it's hard to, it's hard to know. It's hard to know exactly what's happening.
Speaker 2:No one knows who the person is and now they're banned. They were, they were banned from from GitHub.
Speaker 1:Yeah, yeah, I think their account is suspended. I don't know, I cannot see this. Let's see. I think I saw somewhere in the. Yeah, I don't know, I'll have to look at it later. But in any case, it also begs the question, right, like maybe we should Like this is not sustainable.
Speaker 2:No no.
Speaker 1:Right People pushing people on Twitter on GitHub issues on whatever it's not sustainable. No, on Twitter on GitHub issues on whatever it's not sustainable. Right, and I think there is a big vulnerability there. Right, there is someone that is motivated, that has the skill sets to do this.
Speaker 2:You know, imagine. Something like this happened to a linting tool from Python.
Speaker 1:Yeah, but imagine if this already happened and we don't know about it.
Speaker 2:Yeah, exactly.
Speaker 1:Might be Because do you go on looking at exactly what black does or flaky it does?
Speaker 2:and I think still like for this one like this is a compiled thing, so it's like, even then, yeah right so yeah more difficult to catch.
Speaker 1:Yeah, so yeah I think also for now. Bringing a bit to the data in the ai world, right, there is also uh models you download from hugging face. Actually have a colleague, louise, that she showed how a lot of these models are basically python pickle files and pickle files basically uh, they're a set of instructions, right, so they actually run code when you load a pickle file, and in one of those instructions she was showing how you can alter the pickle file to include a black door, for example. So she even showed, like if Vitaly opens a model from Hugging Face, and after that she showed how she could go in Vitaly's terminal, the computer, and go through the files and all these things. The issue with that one is that like the computer turns off and she loses the connection, you know, but like there are a lot of things that could be done, right, and yeah, it's a bit scary, exactly yeah, it's a bit scary, but yeah, let's see if I don't know again, things are still developing on this story.
Speaker 1:Uh, let's see how. How yeah, I mean it's uh, it's very, it's a bit scary yeah, but really like I would advise, if you like drama and you're in tech, just go, go, go like there's a rabbit hole of these things like who's gia 10, who's g at n, who's this this person, this, this person, that but like really nobody can just track this person, I don't know yeah, I don't think so, um because if the person really hit their trace man.
Speaker 1:It was really yeah, yeah I mean yeah, I don't know. I mean like, maybe if GitHub has logging data, right like a region or something. Would they also be able to share that?
Speaker 2:probably not yeah, I guess not right. I mean, maybe they're in the USA.
Speaker 1:So yeah, but what if the person's not in the USA?
Speaker 2:does the NSA care?
Speaker 1:that's above my pay grade. That's above my pay grade, all righty. Um, maybe one quick piece of news. Uh, talking about open source, some things are not open source but they're. Some things are not open source but they're getting more open source.
Speaker 1:Let's say this is the case for Mojo. I think we talked I think Paulo correct me if I'm wrong we did talk about Mojo. We did talk that it was not open source. Yeah, it was always the ambition for Mojo to be open source, and a while ago so we didn't have time to cover this on previous episodes but a while ago they actually open sourced the standard library, see, nice announced the release of core modules from the motor stand library under apache 2 license. So this is an open source license. So they are trying to get more people engaged, trying to get more people to contribute, trying to get more people to open up the box and see what's inside. Um, I haven't tried it in a while but, uh, my opinion maybe a year ago was that they still had some long ways to go on this yeah, because it it was similar to python yeah, so exactly, the promise is a fast python.
Speaker 1:It's not really a fast python yet. Well, actually I don't know if you'll ever be, because the the fast part, if you really want to be super fast. Um, the code doesn't really look much like python anymore.
Speaker 2:Okay, like it has some trade, like it looks like it was a language inspired by python okay, this is my opinion as well yeah, yeah, and maybe to derive a bit from mojo, is I feel like I don't know if it was you that discussed this, but I feel like every bottleneck that now your python program has, if you can rewrite this in rust or c++. Let's say yeah, then you're good to go.
Speaker 1:You don't need to have like a fast python yeah, indeed, I think, um, I think a lot of the times is uh, we talk a lot about speed but, to be honest, like most of the python uh, most of the python, things are actually fast enough. Like there's a lot of uh, I think, I don't know, I think the speed, even with the linting tools you talked about, um rough and uh uv, yeah, and the big tagline is oh, this is a fast rough, a fast black a fast x a fast y, but sometimes I wonder if we're solving problems that we never had yeah, I see what you mean.
Speaker 2:Like for rough, I do believe there is a use case.
Speaker 1:You think? Because before I heard of Ruff, I never had issues where like oh, this linting is taking so long. I did, you did yeah. Yeah, I never had that, but I still switched. Huh, that's the thing I mean. Same thing with UV. Do you have problems with installing Python packages?
Speaker 2:No with installing python packages.
Speaker 1:no, I mean yeah when I use conda but what do you mean, like when you're installing stuff with conda yeah, it was too slow okay, um, yeah, maybe I didn't. I think maybe I do when I have like some dependency resolution, but yeah, I don't know, I think it's like. It's such a like. Maybe my position is I will, I will use it if it's there, but I wouldn't spend time to build something faster either.
Speaker 2:No, no, exactly. I'm happy it's there and I'm happy, like it's getting traction and people are using it because, anyway, it's maybe a very broad statement, but if you can be more efficient and use less resources, it's also better. Yeah, but Even if it's like it's just a mindset, If you try to optimize everything, it means like okay, your CI-CD is running super fast. That means that you will look at your code and try to optimize it as well.
Speaker 1:Yeah, but so I have a few issues. Quote unquote with this one is you can always optimize it further, so when do you stop? It's almost like writing a book you can always review and make it better and do this and that, but like, so where do you stop? And the other thing is like the 80 20 rule. Right, like after a certain point, the time that you invest, like I think it's like diminishing returns or something right, and in the end the your time should be more valuable than cicd time yeah, but and that I think that's why open source is very interesting.
Speaker 1:It's like one person yeah just doing god's work yeah, that's true, like one person does it and everyone benefits from it, exactly because imagine the amount of I discount this in millisecond that people win because they use rough instead of black.
Speaker 1:It amounts to maybe a month now and yeah in 10 years it might be a hundred years yeah, yeah, I think also from the pidentic, because now pidentic is in rust. I also heard the argument of actual environmental sustainability, because you use less energy, you take less time, so you can do the same computation in a more environmentally friendly way.
Speaker 1:So it is a good argument, but I also yeah, I think I don't disagree, but I do feel like a lot of the times teams they're focusing on this and in my head it's like they're probably better things to invest your time in yeah I, I wouldn't do it myself you're the guy that creates issues like this. Is too slow.
Speaker 2:Yeah, exactly yeah, okay, your mental health is uh, your mental health is in danger, but uh did you but the micro is taking too long, you know, did you? Did you see what I posted?
Speaker 1:I know you have all of these books, but I put this two days ago and you still didn't finish it.
Speaker 2:That's what I wrote to Bach before. But yeah, I wouldn't do it myself and I do see your point, but I think, yeah, it's good that it's out there.
Speaker 1:Yeah, yeah, yeah, no again.
Speaker 2:I mean, I'm saying this but I jumped on the rough train. Yeah, exactly, I jumped on the uv train.
Speaker 1:You know it's like, don't get me wrong, don't get me wrong, all right, maybe um ready for some? Uh, sizzling takes super hot, um, so this one. There's a bit of a backstory to it. This is from a friend of mine, actually, rochelle Dyer. Shout out to Rochelle if you're listening, probably not, but shout out to him. The title of his blog post is Becoming Clout Certified Rochelle, fun fact, is also an artist, much like Alex. He also likes to paint and draw, but you can see that here he's not Rochelle, fun fact. He's also an artist, much like Alex. He also likes to paint and draw, but you can see that here he's not only a painter, he's not an artist with image only, he's also an artist with words. You know, he does, you know. So that's, he has a clever pun here. Clout, you explain to me. What does it mean to Clout, alex explain?
Speaker 2:Isn't it like chasing after, like wanting to become popular?
Speaker 1:Yes, I believe. So I mean, actually I think it is, but it makes a lot of sense. So even if it's not, we're going to ride with it. So basically, he's basically making the point here that being cloud certified is more about reputation. It's more about showing something shiny to someone else rather than the actual value of a certification. So he goes on through, uh, so he actually starts the the article by saying that he actually got a lot of certifications for gcp associate, cloud engineer, aw solutions architect, aws data engineer, associate, right, and then, after getting all these three in a month so in one month he got three cloud certifications.
Speaker 1:He had a few thoughts here and he named. So he says even here, given that the best things comes in threes, I have decided to divide my thoughts into three certified solutions. Let's call them the illusion of knowledge, illusion of value and illusion of prestige. So kind of skimming through a bit here stinks. Um, he says that illusion of knowledge is you can be certified but not know much about the certification that you got. Um, one quote from someone else that he put here that says that after a while I think it was an advice from a friend, a colleague or something after a while. You need to decide to stop studying to understand and start studying to pass the exam.
Speaker 2:Yeah, that's kind of related.
Speaker 1:You agree to that yeah?
Speaker 2:indeed, but on the contrary, I also see some people that are really trying to understand exactly what's happening. Yeah, and then they pass with flying colors, but then it takes much longer.
Speaker 1:Yeah, I think I agree with that. I feel like if you know, you're very likely to pass, but if you pass that does not mean that you know.
Speaker 1:No exactly, I think, my first certification. I remember I left there and I was like I have no clue about any of these things, but I passed the certification. So, yeah, and I think it says you're going to be fooled, Like you're going to be fooled. The groups of people that are going to be fooled by this illusion are the people that are browsing through the certification and saying, oh, if I get this, I'm going to be proving that I have knowledge of this. I think if I tell you I have an AWS machine learning specialty, I think you're going to think it's cool, but you're not going to be like oh yeah, Marilo knows everything about that. Yeah, I know, you know Exactly. Yeah. And he also said, like, the people that go through the process understand this.
Speaker 2:So it's more for the people that have never done it. Yeah, it's true, right.
Speaker 1:Actually, he also said that the people that are responsible for hiring and staffing. They can fall into this trap, but I think it's more like a college degree kind of thing. You have a paper to show, you have a certification, and I think for people that are not in this, I mean also, how do you assess these things?
Speaker 2:I guess, you you must have in your interview process somewhere where you really speak with a person and say hey, okay, you have like a AWS certification. What did you do with AWSs, for example? Yeah true and then, based on what the person said, and okay, you can assess.
Speaker 1:Yeah, this guy or this person knows his shit but I think it's like, do you need to ask the question, do you have a certification?
Speaker 2:because to me it's like if you just tell me what you did, you know, I wouldn't even ask about the certification necessarily yeah, but when they arrive at your point, they were being screened first by yeah, yeah and some automatic processing tool and maybe there is some filter or even like some bias in hr, because they see yeah, okay, this, this person has a aws certification I can send them to morillo, but I think it's the same thing, in a way, as a college degree, exactly. I had some exam. I had no idea what I studied for. Oh wow wow, shoot out to kayla um, no, because I also.
Speaker 1:I I had a, I did a talk at university and then students are like, yeah, do you think you still need a college degree to go into being a machine learning engineer? And I said honestly, you don't really, but you still need to get through the door. So if you don't get through the screening process, you're not going to get through the interview to actually show your skills. And today, the reality and I don't like this reality, but it is the reality that there are so many applicants sometimes that you need to find reasons to filter people out. Right, there's only so much, so many hours in a day that you can look at CVs and if you just look at everyone's, you're going to miss a lot of good people, right?
Speaker 2:So and and and this is really hot topic I would argue that you might need a PhD.
Speaker 1:Yeah, phd, yeah, yeah, into the most prestigious companies. Yeah, I think uh back like when I was a student. I remember looking. Indeed, a lot of them said you need a phd or research experience or I don't know. Usually say like 10 years experience working, right so, but yeah, so, indeed.
Speaker 2:So that's the illusion of prestige yeah, but I, I do see his point. Huh, that's uh, yeah he's a smart guy, he's a smart guy uh shout out to rachel again he has his way with the word. He has his way with the word. He does. He does.
Speaker 1:He's a yeah, artist, what can I?
Speaker 1:say yeah, you said it, uh, illusion of value. So he said, uh, why people? Why, why would you? Um, why would you take a certification? Or because it's valuable. So that's an illusion, according to him, right? So one of the things he mentioned that maybe this would be something relatable. You know, you're in between projects and then he said, oh, what should I do? What should I do with this With this time? Ah, let's look at a certification and he's like, why? Oh, because it's valuable and that's the illusion that he's referring to here Like it's not valuable in the sense of you're going to be able to do so many more things if you get the certification. And he does mention here that it's not like there's no value. But I think he thinks that people that place value in the certifications, they probably place more than there actually is.
Speaker 2:Yeah.
Speaker 1:The same thing with the college degrees that we're mentioning, right, college degrees useless? No, they open a lot of doors. No, there is a lot of stuff there, right? Like I think if, statistically, if you say, take all the people in the world that have a college degree in computer science, they're going to know more about computer science than taking all the people in the world, there will be a pattern there, right? But that doesn't mean that you're going to have someone that is exceptional. No, it doesn't have it.
Speaker 2:But I do think there is value in having search certificate, even if you study by heart and never touch the cloud service. So let's say cloud service, you will know the services, so you will know. Okay, this can be used to store Docker image. This can be used to launch those images. This can be used to launch a Python function at scale, for example.
Speaker 1:Yeah.
Speaker 2:So, and even though that okay, they might not have touched it, they will know it.
Speaker 1:I mean I even though that, okay, they might not have touched it, they will know it. I mean, I think, yeah, I think, but I think to me the difference is knowledge versus skill, I guess.
Speaker 2:Yeah.
Speaker 1:You know if I tell you what's my phone number. That's knowledge.
Speaker 1:But you can like, yeah, you can do some things with it, but it's not like now you can build a new know like I think that they are connected in a lot of ways, right, but, uh, I do see that these certifications, they do test a lot on what do you know, right, which I guess links a bit to the last one, which is the prestige. Uh, he said that, like he mentioned, like the north korean generals with a lot of medals, and that's how he feels. When he got the three, he put like three of them on his linkedin and he felt like one of those people right, yeah, when you have too much certification, it's uh strange, huh I have, it's actually it.
Speaker 2:It does the opposite of like having prestigious fees, like maybe they're overcompensating. I don't know, I don't know, hot take, this is a hot take.
Speaker 1:This is a hot take. But he actually also mentions that like the imposter syndrome, right, and having the certifications kind of like give you a sense of, oh, look, you know, and I think a lot of people in the profession in the world, they relate to imposter syndrome. Oh, look, you know, and I think a lot of people in the profession in the world, they, they, they relate to imposter syndrome. Um, and I think it's like after going through this, you can either feel better about yourself because now you have something to show for, but on the other hand, you I think you can have just more imposter syndrome because now you have the certification, you still feel like you don't know this stuff.
Speaker 2:You know exactly.
Speaker 1:Yeah, that's true yeah, it's a bit uh, yeah, it's a difficult one yeah, it's a, it is difficult.
Speaker 1:So he talks about, well, certification. Even though he mentions three illusions, they're not useless. And then he talks about how the system could be fixed right. So he mentions, uh, one way is that the way that the certifications are placed today, they are the multiple choice, right? So you basically have text and then you say, is it a, b, c or d? And then that's kind of how a lot of the exams go, um, but that's that's very um, flawed in a way. Right, I think a lot of times you can go online and see a lot of sample questions. Sometimes, if you just kind of know how these exams kind of go, it's very easy to kind of change the system without really knowing the answer. So he mentions also gene, ai and some other maybe more open-ended questions for graders to kind of select things, yeah, and he just kind of says, well, how relevant these things all are, right, like we can kind of play here and just kind of brainstorm, but how likely is it that these things are going to change?
Speaker 2:people. I think I don't know what to think. I think it's still valuable in some sense. And I think it's still valuable in some sense and I think it will just take one new cloud company coming with a new type of certification, like with Databricks coming in a few years ago, just providing documentation during their exams.
Speaker 1:Yeah, azure is doing it as well.
Speaker 2:Yeah, yeah, indeed. So I mean, it's just going to take one company seeing people like it works, so they just do that. And then we have, like you know, a whole new certification, new type of exams, maybe.
Speaker 1:Yeah, maybe, yeah, yeah, I don't know. I also see how realistic that is as well for people to grade it as well. I'm also wondering if you can do some type of exercise.
Speaker 2:if you can do some type of exercise, you know, deploy this on this and deploy that in this, the resources to have this at scale. Imagine asia having to do this for every applicant unless they jack the price up.
Speaker 1:Yeah, but that will make the certification more prestigious. Prestigious, so would it be? Yeah, yeah, not sure. So I personally don't like certifications. I also don't like to. I don't like to study. I think I'm a bad student, especially today, when I was an actual student, I feel like I kind of sucked it up.
Speaker 1:I had to many moons ago, but I feel like nowadays, when I have to sit down and study, I have to sit down and study. I have a really hard time. Yeah, no, I don't think. I'd much rather just do a project and just kind of show something at the end of the day than really study for certification. I have a really, really hard time, but I still again like if someone tells me, okay, we need to work for this project and you need to demonstrate your expertise on this, could you get a certification? I still think that that's a valid argument. So, to sum up, he just says we need to be a bit introspective and ask ourselves the following questions Am I committed to truly engaging on the subject matter? Am I just here for the price? What value do I wish to gain from this certification, and is it more than I would gain from doing something else? And am I just doing this to make my LinkedIn profile look sexier?
Speaker 2:I think in the end it all depends on the person taking it and that's.
Speaker 1:I think that's a kind of his conclusion. You know it's like he's talking about illusions, but I think the important thing is so you don't are delusion yourself, right, like you don't trick yourself, right like why are you doing this? Like, if you like doing these things, if you like to learn more about these things, then go for it. You know it's a very valid reason to do so. And also to cover all his bases, he's saying that he only took a few certifications. He also had some experience with aws as well, so that those things also does matter, right? For example, what you mentioned beginning if you have five years working with aws, you probably can just study for an afternoon and take the exam and you're probably going to be fine. If you've never done this, you probably need to spend more time and you can still pass right. But the people that have experience, I think it's very unlikely that they will fail the certifications and I think that's probably the population that these exam, the people that prepare the exams I don't know how to say it.
Speaker 1:Yeah, yeah. I think that's probably the population that they want to cater to the most. They want to make sure that whoever should get a certification, because they know the stuff will get certified All right. So, and I think that's kind of it for today. Maybe do you agree Any less thoughts on certifications. How many certifications do you have?
Speaker 2:I can't count them Too many. I have one Azure, I have one Databricks, I have one Google.
Speaker 1:Okay, One Azure, one Databricks and one Google.
Speaker 2:Yeah, okay.
Speaker 1:No, you have a prefect, I have a pre. I mean, they have, I have a perfect one.
Speaker 2:I saw that one, I have five. I'm the certified man there we go man good job, proud of you all right.
Speaker 1:Yeah, I have a field myself as well. How many some of them expired? I have to go on my linkedin to check you didn't take the exam to no no.
Speaker 1:But I think I don't have any Azure certifications. I think that's the only one that you can renew by just doing, like this almost online quiz thing. Yeah, really, yeah, all the other ones. You basically have to take the exam again. I think for Google they just say we'll let you, we'll give you a 50% discount For give you a 50 discount for aws. I think you can do one for free but you basically have to to retake it yeah, it's not a yeah thanks, but no thanks.
Speaker 1:Why can I see this license? Okay, I have not, so humble brag, get ready, prefect snow snowflake. Uh core. Astronomer for apache. Astronomer for Apache, airflow Astronomer for Fundamentals. Aws Machine Learning Specialty, google Cloud for Professional Data Engineer. Hashicorp for Terraform and Professional Machine Learning Engineer for Google. That's it, drop mic.
Speaker 2:Oh thanks. Nice, thanks, alex.
Speaker 1:I needed that, but yeah, but again, I did all this, but I think I'm a living proof that how much that it works podcast host. Now life is good. Just follow me for more tips more realer on. Linkedin. Yeah, yeah, but cool. Thanks a lot for joining. This has been fun. Yep, enjoy your weekend. Any plans for the weekend?
Speaker 2:yes, I have two friends having their 13 years 13 30, ah, 30 ok 13 years like ok, alright, bye, thanks everyone and then prepare the show and and everything at their apartment. So will be nice.
Speaker 1:Okay, cool. What about you, Alex?
Speaker 2:No. Okay, I'm so sorry, thanks, alex All right cool.
Speaker 1:No, yeah, no.
Speaker 2:Maybe preparing some bloopers video.
Speaker 1:Yeah, on the weekend no, no, no, but not on the weekend, that's fine, no, okay, cool, thanks everyone. Thank you, see you next time. See you, you have taste in a way that's meaningful to software people. Hello, I'm Bill Gates. I would recommend TypeScript. Yeah, it writes a lot of code for me and usually it's slightly wrong.
Speaker 2:I'm reminded, incidentally, of Rust here, rust.
Speaker 1:Congressman, our iPhone is made by a different company, and so you will not learn Rust.
Speaker 2:I'm sorry guys, I don't know what's going on.
Speaker 1:Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here.
Speaker 2:Rust Rust Data topics.
Speaker 1:Welcome to the data. Welcome to the data topics podcast.