Sovereign AI, Redpanda vs Apache Kafka, The Future of Data Streaming with Alex Gallego (CEO of Redpanda) Artwork

What's New In Data

A podcast by Striim (pronounced 'Stream') that covers the latest trends and news in data, cloud computing, data streaming, and analytics.

All Episodes

What's New In Data

Sovereign AI, Redpanda vs Apache Kafka, The Future of Data Streaming with Alex Gallego (CEO of Redpanda)

August 02, 2024 • Striim

Redpanda CEO Alex Gallego joins us to talk about Sovereign AI that never leaves your private environment, highly optimized stream processing, and why the future of data is real time. Discover how Alex's journey from building racing motorcycles and tattoo machines as a child led him to revolutionize stream processing and cloud infrastructure. Alex also gets deep into the internals of Redpanda's C++ implementation that ultimately gives it better performance and lower cost than Apache Kafka, while using the same Kafka-compatible API.

We explore the challenges and groundbreaking innovations in data storage and streaming. From Kafka's distributed logs to the pioneering Redpanda, Alex shares the operational advantages of streaming over traditional batch processing. Learn about the core concepts of stream processing through real-world examples, such as fraud detection and real-time reward systems, and see how Redpanda is simplifying these complex distributed systems to make real-time data processing more accessible and efficient for engineers everywhere.

Finally, we delve into emerging trends that are reshaping the landscape of data infrastructure. Examine how lightweight, embedded databases are revolutionizing edge computing environments and the growing emphasis on data sovereignty and "Bring Your Own Cloud" solutions. Get a glimpse into the future of data ownership and AI, where local inferencing and traceability of AI models are becoming paramount. Join us for this compelling conversation that not only highlights the evolution from Kafka to Redpanda but paints a visionary picture of the future of real-time systems and data architecture.

What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

0:05

I'm You Hello, everybody. Thank you for tuning in to today's episode of what's new in data. I'm really excited to introduce our guests today. We have Alex Gallego, CEO of Redpanda. Alex, how are you doing today? Great. Thanks for having me, John. Absolutely. Alex, we've been talking about doing this episode for a while. You have some amazing topics that you're going to share with the audience today. You're a. Really great technical leader with many patents and things you've worked on, especially with, with respect to stream processing and cloud infrastructure. But first, you know, tell the listeners about yourself. Sure. So, you know, I, I guess I've always identified myself as a builder and so I've always built things ever since I was a little kid, I used to build them, take apart racing motorcycles. Ended up, I think when I was nine, I built a tattoo machine that I ended up selling it. I, then when I went to college, I ended up just liking to build software. I actually went to school formally for security training. And then at some point I decided that I liked building more than, you know, trying to break into systems. And I ended up building a stream processing company, a startup. And then I built a stream processing company. I sold to Akamai in 2016. That was really fun. It was a, it was a distributed compute framework written in C plus plus, which, you know, will be a theme across this podcast. And then when I was there, I ended up building what became Redpanda. And so it's sort of being super technically focused as a CEO. This is my first time being a CEO. I think prior to this, I was the CTO and then just like yeah, really interested in, in, in building large scale systems. Absolutely. Really incredible. Cool. Things you've worked on in the past and what you're working on now. You know, as CEO of Redpanda, I'd love to learn more about your vision for the future of data and cloud architectures. Yeah, you know, I think there's, it's important to bring along the audience around, How this idea of Redpanda came about from a technical level so that the context makes sense. The words I'm about to say, they're like, okay, now it's sort of connected ideas as opposed to just like, you know, this, what would seem like an unconnected thought. And the thinking is that When I thought about building a company, it has always for me, it was always about making the engineer hands on keyboard, the hero of the Redpanda story. And we've been really consistent with that. And so in many ways, I've always built the company that I wish to be. I had when I was an engineer sort of evaluating product. I was like, what, you know, you, you'll see things around, like, why is it so hard? Why does it take a distributed system experts to get a hello world up and running? Why is, you know, why does it feel like systems whack them all to try to understand, you know, anything in, in how anything works in this infrastructure, it's surprising that then anything works at all. And so it was sort of the shift away from, my desire to build really, really easy to use simple systems. And so. I think my view, on the world may be contrarian to what, what people come have come to think and operate on the way they've built applications, which is their future is not going to be built in, but their future is going to be a real time. And I think that you'll see more and more announcements Databricks as they start to shift away from this modality of thinking. Of the punctuation of an infinite time horizon to make sense of the word. So let me use different words. Classically, the world was broken up into chunks, like an hour. What happened in the last hour? What happened in the last three hour? What happened in the last day? Last week, I was talking to a gaming company. And that this issue where, they have this reward system and every three days they run a batch job to, to catch cheaters, right? It's very classic example, right? Like every three days, they're like, hey, it's super expensive job. And, and that's, that's how they've been doing fraud. And then it turns out. that they now have this gamers that, that actually, you know, game the loot of the system and resell the coins and weapons that they earn in the system every two days. And so it's just announced that they're never ahead of time. And so that's like one example where the world is shifting away from this, you know, every three days is a report or the idea of a fraud. Like, if you transact your credit card and you would get a like, you know, it would be absurd to most listeners dialing into the podcast to think that, you know, if I'm in San Francisco, downtown San Francisco, like, I would just expect that my credit card would get automatically denied by a Nigerian prince. And the idea of Or having a batch process that would run at midnight that says like, Oh, by the way, someone bought a laptop, you know, would be crazy. And so I think that's the fundamental shift that I see in the future of data architectures. Absolutely. And you're right, you know, even the large data warehousing players that have traditionally, you know, built their business around batch processing are shifting to streaming architectures and streaming ingestion and, and better interoping with those sorts of systems. So there's certainly a lot of evidence in the market that that's what, you know the, the end users and, and, and the customers and the data engineers and the, the CEOs want. I mean, really how, when have you ever heard a CEO be proud about? You know a dashboard in their company that's stale. No. When they talk about using data, they always say we, we act on real time data, you know? So. Yeah. It's like, I've never met an executive that is like the data is too fast and too fresh. That's, you know, that, that, that concept doesn't even make any sense. And yeah, I, I, I, I tend to agree. I think there's like this, there's also this really important. Macro trends that, as consumers, we just think, take for granted that transitively look like an event driven architecture or, or sort of a real time architecture, which is when I have three kids. When Sarah, my wife, was pregnant with twins, it was really refreshing to be able to show her the DoorDash app. And I was like, look, the food is coming in three minutes. Or you know, the idea on like X or any sort of social media that you sort of get to interact with people. And at the end of the day, there's this need of interactivity. For businesses, that trend looks at it like at the infrastructure at, you know, when you start to actually execute CPU clock cycles from code, it very much looked like a lock. It very much starts to look like systems that I happen to be working on for the last 16 years, which is this, you know, idea of a log, this idea of being able to. have this immutable source of truth on which you can then go and build a bunch of materialized downstream systems from. Yeah. And the, the, the log itself is a fascinating concept when you really dive into the applications of it. And certainly going to be a reoccurring theme and something for, you know, data engineers and software engineers really study up on if, you know, if, if they're starting to get more exposure to these real time data systems, you know, before we go too far down that rabbit hole, I, I, I'd love to ask you, you know, you know, what's your vision for Redpanda and, you know, why did you create it? What was the problem you were solving? I couldn't find a storage system that would be able to keep up with the volumes of data I was trying to push. And obviously at a cost effective way and, and to a large extent, you know, at the time I was Akamai, but, you know, prior to that, I had been using Kafka for like nine, nine years. And I used all of them, you know, I use Kestrel, I'd use, got just about every other version of a distributed log in the hopes that the latency and cost performance profiles of the systems made sense. And so, Having built Concord, which was a computational framework. And for people that are listening, that don't understand what a streaming is at a basic, at its most basic level is this idea that you can take a little bit of compute and a little bit of a storage and you chain it together. And at the end you have something useful like fraud detection or like door dash delivery or you know, real time, reward system and we, we power, you know, one of the largest reward games in the world. And so. Those two concepts, when you chain them together, that's, that's really roughly what a stream processing is. It just is a different way in of, of thinking about the world rather than like a group of events. And people do that because you could do more with less. And so you become more efficient as a business. You basically get to do take advantage of both individual data items as well as group of events. And so to level set the audience, like that has been classically about this idea of taking advantage of, of, of groups of events of like, Hey, Alex is a male, he has three kids middle age. So I'm going to present to you Milwaukee ads for a drill on father's day. Cause like, maybe you're going to click on this thing. Whereas like this real time thing is like, Hey, I managed to prevent a fraudulent credit card transaction from this Apple store at this particular time. I just saved the business 3, 000. And so that's sort of the notion and both had been classically useful, but the thing that was missing to connect both worlds is this idea of a Anyways, that's sort of like a little bit of a background on what stream processing is. And so, in many ways, being an engineer is very freeing. You never have to ask people for permission. You simply open up your laptop, and at the time of writing Redpanda, I was using a Lenovo P50, and I grew up using Emacs and GCC, and, so, you know, I fired up Emacs and GCC, and the thing I was trying to build was, like, why, why is the world so hard to use, like, why is it, computationally speaking, why is it that streaming has been classically so expensive, so expensive? And so hard to use. And I had seen at Akamai what a team of 29 engineers could support on hundreds of thousands of servers across tens of thousands of data centers around the world. I was like, that's the operational model that makes sense for a company to scale. And so in many ways for me, was always, has always been, like I said, at the beginning of the podcast was like, how do I make that person hands on keyboard? Who's building the next generation of products, like the hero of this story. And so the themes for us, as I see it. is building, the platform on which the future is going to be built. And so the future is not going to be building batch. The future is going to be real time. The future is going to be, you know, powered by this AI and, and so on. And so there's all of these pillars. And so that's where I see Redpanda playing an integral role. But I also don't want to say that we are the end all of, of infrastructure products. I understand that Redpanda is what I would call level zero or zero in your infrastructure. It is very much the first layer after your microservices. It's like you, you take your microservices and typically we'll look we'll land into something that looks like Redpanda, like a distributed log. And then, then the rest of the business we've built on top, whether it's materialized into Mongo or SQL or a SQL database or, or what have you, or a search, database. The first layer, that immutable source of truth. That's what Redpanda powers. And I think if we do it cost effectively, And in a way that's really easy to use, I think the world will be built differently. Yeah, that's an incredible vision. And, you know, we certainly see a lot of that you know, in terms of the vision of the product teams that are building with, with data engines and, you know, cloud systems, right. It's all about optimizing things over time. And, you know, there's this iterative approach, which is, yeah, let's start with, you know, batch processing to, you know, move some data around and, you And, you know, process things with, you know, spark and then load it into files. And then, you know, people are going to query those files either through warehouses or, you know, table formats, what have you. But, you know, the business, every time I've seen the business chime in and really look for cutting edge applications and ways to monetize their data, you don't want to be, limited by the speed of your internal data processing. So I think your point is, is certainly well taken about. Real time being the future and, you know, enabling the engineers with the infrastructure to actually ship with it in a way that's cost effective and, convenience and, you know, practical to build now on the data streaming side of the, you know, data architecture, patterns, you know, Apache Kafka. very popular, has a large open source community, a lot you know, a lot of adoption, you know, how does Redpanda either compare or compliment Apache Kafka? So Apache Kafka is a really big project and we stand on the shoulders of giants. We simply couldn't be here as a company. Being able to advise the world's best businesses. If Kafka, the project hadn't taken off and as a concept, more importantly, hadn't taken off. I think that that was a pretty big shift from a decade ago till now is that, while Kafka is the most popular version of this, right? I think prior to this, you had older companies like Tipco and Solas and then earlier projects. Like RabbitMQ and so on, but Kafka, what, what Kafka did really well in my opinion, is they built a really incredible community and the community, if you have a slightly different lens than like, there's obviously the GitHub project and that's, that's really important too. But the GitHub project is the implementation of the broker, the signature client, which is the Java client. And, and those were, the sort of the tip of the iceberg, if you will, for people to change the way that they think about building infrastructure. And what followed was this huge wave of community projects and contributions. And to a large extent, Kafka became this lingua franca of streaming. And it's almost synonymous. with a different way of thinking and building. As in people will refer to, Redpanda, Kafka, and others in the space that's like Kafka esque or, you know, the Kafka ecosystem. But they're really very different implementation. It's sort of, in, in many ways, I, I think it's the most successful part of Kafka has been the community. And so in that way, Redpanda really extends the community. And so what, when I remember where Clang and GCC started sort of competing, like obviously we do compete at a broker for broker level. So for people listening in, you can think of Redpanda. In the most tactical sense as a drop in replacement for Apache Kafka as in you have brokers, you know Obviously you don't need the zookeeper or the schema register. That's all sort of built in but as a concept it's more helpful to understand that Kafka the protocol became This universal layer for doing real time things. And Redpanda, you know, happens to have like a particularly strong set of skills around ultra low latency and really large scale. And so our customers. Just for context would push like 30 gigabytes per second sustain for some of the world's largest, like, you know, background telemetry use cases in the world, or for pushing, you know, autonomous vehicles or for pushing like large scale IOT infrastructure. So we happen to be like a really good fit, but from an ecosystem level. I very much think of Redpanda as an extension of the entire Kafka community, where all of the existing drivers would just work. It's, it's almost like, you know, we deliver the electric engine to the combustion engine comparison, and Redpanda would be the electric engine in this example. But anyways, but the rest of the car would be like the same. You still need tires. You still need a steering wheel. You still need a bunch of these things. And so that's how you can think of tactically. about Kafka, excuse me, about Redpanda, the, the product, not like the company or why people will end up paying for it. Right. Obviously built more, more things around it. That's, that's how generally I would think about Redpanda. It is a different implementation that supports the entire Kafka protocol. And, and we see very much ourselves as part of the Kafka community as, yes, it's a replacement of the broker itself, but it's an extension of the community for all of the client lines, which, by the way, is an order of magnitude larger. than what people tend to think of Kafka, which is like this broker Java implementation of, you know, 400, 000 lines of code. I was like the number of clients and libraries and code bases and databases that connect to the Kafka protocol is in the millions of lines of code. Right. And so it's like, it is much, much larger, the ecosystem that connects to the broker and therefore Redpanda as well than the actual protocol implementation. Does that make sense? Absolutely. And some components that you mentioned with Apache Kafka, you know, to the, the, the most common Java client Zookeeper, which is getting phased out and schema registry, you know, what are some of the core architectural differences between, you know, Apache Kafka and Redpanda? Yeah, I would say if we go all the way down to the metal and then we can sort of build our view of the world from the metal above is, when I started running the original code base for Redpanda, I didn't intend it to become a product, I didn't intend it to become a company, I wrote it for me. I was in Miami in my apartment and I had left Akamai and I didn't know if I wanted to build a company. I just built the product that I wish I hadn't, what he came down to is, could we build, like, sometimes you're lucky enough to reinvent the wheel when the road changes is the thing I like to say. And what changed fundamentally between the work is still an algorithm that is used, like if you git clone Kafka. It uses a bunch of thread pools with a work is still in style algorithm to process Java threads. And it, like, dequeues, right? It has, like, a set of Java threads for doing networking and a set of Java threads for doing IOT disk and a set of Java threads for doing a bunch of other things, like decompaction a bunch of things. And so, in fact, it was using Bobby Blumhoff's work is still in algorithm who used to be a CTO of Akamai or maybe is. I need to double check. And, And the insight is that the, the rise in, in many core systems led to a different fundamental design when you take a, a lens on the hardware, which is the communication overhead to keep this things synchronized tend to dominate the performance of this large systems. And so in particular, the locking semantics of work stealing threats from multiple cores, Even more accessor based in, in, in, in multi socket motherboards, right? So like if you have two separate CPUs, it's sort of a disaster from interprocessing around perspective. And so at the lowest level, it was about building, a software stack and from a software level, it was about building the software in a concurrent paradigm, where parallelism, aka simultaneity of execution, was a runtime variable. And so we started it with a framework called C STAR, which for those listening in is just a network of SPSC queues with really nice primitives around preemption and co routines and scheduling primitives that allow you to think of the world as a group of single threats that communicate between each other with the network of single producer, single consumer lock free queues, and then you get to start to build from that. And it turns out that the Kafka protocol at an end to end application level, it has this steady state leadership that is when a client connects to a partition leader, it will always maintain this TCP connection. And so it so happened that the stuff that I was really interested in at the time. Worked perfectly with the end to end, like build a consumer protocols of a steady state leadership. And so said, in other words, I don't think I would have built Redpanda in the same way if the client protocol, released the locks and reacquired the locks around random nodes, I think I would have built a totally different system behind the scenes, but because of the end protocol, we wrote Redpanda using a thread per cord architecture, which is this message passing infrastructure that I was saying, which yielded. Really strong latency improvements. So the speed that it takes to do to go from point A to point B, in other words, like to, to actually do the job, was around saving a bunch of microseconds across the stack. And performance is a, is a topic that is very near and dear to my heart. And we can spend hours on it. So we won't go down to, to all of the rabbit hole. details, but I tend to think of latency as the sum of all your bad decisions and, build them from scratch allows you to learn from systems that came before you in the context of the modern hardware, which is the only real You can always write software on a new abstraction. Like JVM is a great abstraction. Node is another great abstraction. The Python VM is another great abstraction. But the only true abstraction is actually hardware. What is like, you know, on what CPU family with like, what is the throughput of the message? Bus between memory and the CPU, like that's ultimately. And so if you take a fresh look, I think people would end up with an architecture. It looks like right. Yeah. And that's, that's incredible. It's operating at such a low level on the, you know, hardware. And of course this can have impacts to the positive benefits, I should say for the overall operations, the costs, the performance. So how do your users typically talk about like the, the, the gains they get from using Redpanda over, let's say Apache Kafka by itself? The to talk about real production use cases, we took you know, we've taken 380 brokers down to 24 and, you know, sort of different sizes. And so like multi million dollar savings, It's, it's not uncommon for the kind of workloads that, we have sort of, we feel really lucky in that when, look, when I, when I launched the company two years, first of all, in 2019, people thought this was dumb to write a startup in C plus plus, like, I'm not sure I knew of another startup, maybe one or so that was building in C plus plus now all the LLM models, by the way, I've written in C plus plus, but at the time there's this really strong reaction. I was like, Hey, it's like, this is, this is not a great idea. And I it was obviously the familiarity of it. But as an engineer, you just sort of get to make decisions like you don't have to ask for permission. You get to build what you truly feel like it's true to you for you, not for anyone else. And then in 2020 we launched. And we got very lucky at the time that the only real other company outside of building it yourself with Apache, you know, has been, complimenting like other, you know, there are great success and I think Redpanda would be smaller if it wasn't for them. So it's a great company doing, you know, lots of success, but in practice, what it meant is that we went to market with the world's largest and so 2021. Was not a whole fun. It wasn't, it wasn't a fun year. You know, we were I remember doing 14 hour days, pretty common Sunday, seven days a week just to, man, just to patch bugs and restart the system. And like the scale was just so much larger. I had thought I was like, Oh, we'll just start with the small startup, like, you know, tens of megabytes or whatever. And it wasn't the case. We launched with multi gigabyte workload per second. And so in practice, I guess the context of this. Is that the types of relationships and partnerships that we have tend to be because of the way we grew up because of the initial market traction because of how we went to market, typically with the world's largest workloads in the world. And so the savings that people are hearing from me, you know, are real production workload changes from like, it is not very uncommon. I think the last one. Kind of on the more dramatic end was taking 150 computers to like 27 or something like that. That was the last one doing, you know like tens of gigabytes per second. So like the, the, the paying attention to how hardware is built, can yield a categorical savings for, for, for people at the infrastructure level. And it's not just hardware. Like last moment piece in means. a lot easier. It is a lot easier to manage three computers than like, you know, 50. And, and so there's a lot of positives around team size and a bunch of other things that come down from, from, from that, you know, one of the other interesting new trends that's coming in the data industry is You know, optimizing for small data workloads and not to say everyone has small data, but something that's portable and easy to manage, right? Cause you don't want to jump to using like a spark cluster just because you have some, you know, analytics requirements, right? How does Redpanda play in there? Yeah great question. Huge fan of DocDB and another friend of mine Glover building, Pekka building TorsoDB. It's a SQLite, database fork where it's really designed for what people tend to think of small data. And like two things have happened. One, computers happen to be really good at And like you can actually playing my data scientist was playing with Dr B and it was like, you know, 10 gigabytes. That's okay. Just download it on your laptop. But like that whole concept has sort of shifted over time where this frameworks are super powerful. It turns out. In practice that because Redpanda can use a single Pthread like at the Linux kernel scheduling task level, like a single Pthread and like 100 megabytes of memory or something like that, super, super tiny at the edge. It is often paired with, you know, other technologies, whether it's like a doc to be or, you know, sequel lied or other embedded processing framework. We just bought a connectivity company 99 or eight weeks ago or so where it was used for embedded. So, like, Europe's largest electric, electricity firm, the firm that generates the largest electricity for Europe was using this framework that we bought called BENTOS. And the reason for that is because they're like, Hey, we're using these two technologies together. And by the way, it was actually at the edge on, you know, the wind turbines as one example, or like at sea. And so it's, it's kind of, that's the lens that I get to see in. It's sort of in this like embedded mode, less so on the laptop. You know, our single node, perhaps production deployments. I think that's not, that's not where I get to have conversations with these days. We obviously have people that have tried it. But where I see the, the small footprint being a benefit tend to be for discontinuous streams of events at the edge, whether it's a flying helicopter or, you know, wind turbines or, monitoring systems. That's where we tend to see like, perhaps what, what used to be overwhelming amount of data, this, this new wave of systems like duck DB and torso and Redpanda and what used to be called bentos. Now Redpanda connect are just a really strong hold. Cause it's like, I just want the engineer just wants something easy to use. And these things are just so powerful. Absolutely. And you know, I think there's a, A few perspectives on how to make data easy. I think there's one which was very popular. Let's say, you know, sort of what I call the zero interest rate phenomenon of, of data, which was like, Hey, you know, all these SAS data vendors will manage everything for you. And that's how it's easy. Cause you know, it's just going to be in their cloud. Right. And they're going to abstract everything away from you. Now. There's drawbacks, there's obviously advantages that, yeah, you don't have to deal with the infrastructure and it's a URL or like a UI that you just log into, click some buttons, but then there's the drawbacks, which is, you know, it's sort of a black box and things like that. You know, what are your thoughts on, on like fully manage is like a, a mode of simplicity versus you know, portable, lightweight compute. Great thoughts on the zero interest rates. It's, it's been an interesting thing. We grew up with what I would call a calorie restricted diet. Like we had to spend and be very judicious on how we spent. The money for the company, who we hire, why are they building what they're building? What is the focus on more so than, than others that, that came, before. And what we, so, so about three years ago, you know, in, in talking to customers, I saw, we saw two trends. I think the zero interest, Interest rates was one, but also this rise of what Google at the time called the sovereign cloud. And I think Amazon outposts came out and like this trend, there's like weird trend where hybrid clouds were now offering like your own cloud near your, your company. And so this shift away. Mostly because of privacy and so all of this is public information, but you know, in Germany they require cloud vendors to sell them a fully sovereign cloud operated by Germans and but, but the software stack is actually that of, of, of this hyper cloud. So when you combine both trends, I think you get. You know, in my view, I think the world would be different. The world is, is a place where like, it's okay to send your data to the snowflakes or the database of the world, but in many ways you've jailed your data. And that was the zero interest rate phenomenon. And then if you pair it with the trend of privacy of sovereignty, you're like, well, how would the world look differently if, if, if I were to, to invent it? And so a few years ago we got really, publicly criticized by a bunch of people for thinking of this idea. We called bring your own cloud and the concept there was really quite simple and security companies, by the way, have been doing this for a long time, but in the data space less so it was this idea that you divorce the control and data plane and the data plane would live in the customer's network. And the critique was that as a startup, you give up. Amazon margin markup. And so if you take all these vendors, it's very common for them to take the list price and 30 percent margin on top. And obviously inflates your, your ARR, but I was like, as an engineer and, and, you know, or I guess it's a CEO that grew up as an engineer sometimes, and there aren't that many times where you get to make a few decisions for the company, you're like, you know, this is how we're going to go build it. And one of those is like, I want to build that product. That was the product I wish I was able to consume before where I own my data. It sounds absurd when you say it out loud, because it is, it's like, okay, I can send you my data and some vendors will even allow you to send their data for free, but every time you access it, you have to pay for it. And so what it does is that introduces this weird cost model for the engineer, where you're like, it's no longer about feature velocity. It's no longer about like actually building the best product, but it's, you always have this like really strong financial constraint or like, Oh, I can't use product X because it's too expensive, or I can only run. Two queries or whatever it is. And so the shift is a place where people own their data in their own, infrastructure. And the difference between privacy and sovereignty is that privacy is a checklist. You're going to take John, you're going to scramble, you're going to The social security number, and you're going to encrypt the data address, whereas sovereignty reduces to either you do or you do not own the hard drive life cycle where the data lives. And so, in my opinion, the future is going to be BYOC for all data intensive applications is otherwise too hard. too costly, too difficult to scale with the amounts of data that people are generating today. Yeah, it's, it's absolutely an exciting idea. And, you know, I think ultimately it's about being flexible and, and allowing data engineers and enterprises and the teams that are actually using these systems to not, you know, give away the keys to their data. To someone that, like you said, you know, charges you every time you, you have to access it. And, you know, even, even with the work we do here at Stream where, you know, we have fully managed change you to capture and we have self hosted change you to capture. And it's ultimately a choice, right? There's trade offs to this, but you know, when you try to force an enterprise down a certain path, you know through, you know, pricing model or whatever it is that's where these trade offs tend to favor, you know, poor outcomes, right? So, Making it very simple and flexible is always advantageous for everyone, honestly. So, yeah, I think the idea of BYOC is certainly very exciting and certainly rising. And I also want to ask you, you know, generally, like in the, in the software ecosystem, in the data architecture ecosystem, you know, what are you most excited about? I am super, super jazzed about the tailwind that, Hey, I is causing for is is is accelerating the adoption of real time systems. It doesn't it wouldn't make sense for chat for you to enter your question. I get an email at midnight. That is like, oh, this question that he asked me to summarize at lunch. This is the answer. You're like, I have no idea why I ask you. Maybe in my case, this dumb question at lunch, I was just trying to understand this website and I didn't want to do the work of trying to get it summarized. And so I asked you, please tell me what this product does, you know, to get it at like midnight. And so what's been fascinating to me is, really like the the pressure of both thinking about the future differently from the migration away from batch into real time. That's like it's really strong. But to like the the acceleration that this air and Jenny workloads have caused on cost on infrastructure products like Redpanda. When we just did a tour this year, I just came back from, from doing a field trip, report where I got to talk to, I think basically the U. S. Large banks on most of the CIOs, their concerns were actually different and something that. I think it's going to be a shift in how these AI models are built and shipped and how enterprises are going to be leveraged, leveraging gen AI. And so the classical thinking has been, if you look at Anthropic, if you look at OpenAI, if you look at all these APIs, for them has always been about driving the cost of the model calls lower so that it is not, too expensive to actually enrich most of your data. Or in other words, you know, make sure that every part of your business has, has additional intelligence on, on your data products. The concern is that if you, you know, in my interview said, if I interviewed like the world's largest hedge funds the world's largest banks, the world's largest healthcare companies, drug discovery companies, and so on, They just said point blank. We will never send our data to any models. And so I think that the opposite trend is fascinating, which is the idea of sending the model to the data. And so what we've been doing up until now, and if you ask people most AI is in, is in, you know, RFCs and documents and design and prototypes, but very few people are actually making dollars in production, with, with this models. I think someone wrote, I think Bloomberg I wrote that there's like 50 billion spent, at Goldman Sachs, excuse me, 50 billion spent, and only 5 billion is generated in revenue. So there's this huge disproportionate spend in this new technology, but very few people are actually making money on it. And the shift, in my view, Is the shift away from sending your data to a third party API and instead doing local inferencing and the insight is twofold. One, the high end models Mark Zuckerberg just wrote today on the release of LLAMA 3. 1 that the foundational models are not too dissimilar from each other. And so that, that's a pretty big insight here, in that that, therefore the cost will continue to go down. If you had like an order of magnitude better performance in those foundational models, perhaps businesses would incur the risk of sending your data to those models, but that's not the case with, with Gen AI. The case is that all of the state of the art is roughly around the same ballpark of performance as well. And two, you can take it. An open source model like a Lama three or a Quinn or something like that. Fine tune it. And get state of the art performance for that particular subset. And so if you're a drug discovery company trying to use this elements, you may not need to know about carpentry or like whatever, changing your, your tire on your bicycle or, or any other like thing that isn't related to this particular subject. And so if you take this open models and you fine tune it, the cost performance Of this model is already an order of magnitude over anything else. So it's both the cost effectiveness, but it's actually more important. It is the sovereignty of the data. And so, the idea that you understand the execution. Through a lineage through actually, like what was the input event with the properties at what time generated by what system, paired with a cost you know, an effective cost performance model that executes locally with the knowledge that your data never leaves. And so you can set up a firewall perimeter just for the truly paranoia and says, okay, make sure that data doesn't leave, you can get. Better cost optimization, you can get true understandability because you can trace the execution of a particular gen AI model all the way up to the source data via lightweight headers. And so that whole concept, of being able to deploy like you, customers already have the data and they have models, right? So like a Lama three, a quinn, a five, three or whatever Lama three, one that was just released today. Mistral, like the models exist, you obviously have the data because otherwise you wouldn't be a business. The real issue when the rubber meets the road has been this enterprise envelope of the usual things that frankly, every enterprise has done authentication, authorization, access controls, audit logs, et cetera. And so if you pair that. With the idea of sovereignty, which is especially for your AI, right? So we call it sovereign AI at Redpanda. The idea is that if you invert the flow, instead of sending your data to the model, you're sending the model to the data. I think that's what will unblock the enterprise from actually making money and leveraging this new, this new technology. And I find that really cool and fascinating for a real time streaming nerd. Sovereign AI. That's the first time I've heard that. Term very exciting and certainly something that speaks to the audience of developers and large companies that are embarking on AI and want to make sure that, you know, they're not essentially adding, you know, risks. By going after the benefits of introducing AI within their company, which, you know, they're, they're, they're certainly many and a lot of opportunities for all types of companies to, to best deploy AI. But yes, I think your, your, your point about, you know, not being required to send your data to a third party API or, you know, giving the, cause it's back to the same thing where you're giving someone else the keys to your data. And, and paying again, you know, every time you want to access it. So definitely you know, an exciting trend. We'll, we'll, we'll certainly keep an eye on, Alex, where can people follow along with your work? Yeah. So I am super easy to be reached on Twitter at emacs air. No is the handle. Otherwise there's like a red button. com slash of Slack. I'm always on the community Slack. I'm trying to answer questions. It's like I'm traveling. And I feel like I could contribute to. Obviously with escalations and customer success, we have a professional team, but for the community, I'm always chiming in and trying to answer really easy questions. So feel free to reach out to me on both. Love that Twitter handle. I'm sure you get a lot of angry Vim people chiming in and no, I'm joking. It's actually, it's like the largest, it's a negative signed in unsigned in conversion to a signed in as the largest error in the Linux kernel. And so I found that tag at some point and I was like, Oh, that's like a fun handle. But it's technically the largest error. So, Oh, wow. Okay. Okay. Yeah. I was relating it to the the interminal Emacs, which has the same name, but thank you for educating me on that as well. Yeah. Oh, got it. Okay. Emax with an X. Got it. I thought that was you're talking about Emax with a C, which is of course, you know in terminal, programming software. Yeah, that's the editor that I've used. I still use it to this date, by the way. It's I still take some notes in, in Orc mode. I need to, I need to migrate to something better. Yeah. Well, I mean, if it works, it works. Alex Gallego, CEO of Redpanda. So thankful that you were able to join us today and talk us through your, your vision and the exciting stuff you're building. Thanks for having me, John. Appreciate it. I'm You