Small Data, Big Impact: Insights from MotherDuck's Jacob Matson Artwork

What's New In Data

A podcast by Striim (pronounced 'Stream') that covers the latest trends and news in data, cloud computing, data streaming, and analytics.

All Episodes

What's New In Data

Small Data, Big Impact: Insights from MotherDuck's Jacob Matson

September 19, 2024 • Striim • Season 5 • Episode 2

What makes MotherDuck and DuckDB a game-changer for data analytics? Join us as we sit down with Jacob Matson, a renowned expert in SQL Server, dbt, and Excel, who recently became a developer advocate at MotherDuck.

During this episode, Jacob shares his compelling journey to MotherDuck, driven by his frequent use of DuckDB for solving data challenges. We explore the unique attributes of DuckDB, comparing it to SQLite for analytics, and uncover its architectural benefits, such as utilizing multi-core machines for parallel query execution. Jacob also sheds light on how MotherDuck is pushing the envelope with their innovative concept of multiplayer analytics.

Our discussion takes a deep dive into MotherDuck's innovative tenancy model and how it impacts database workloads, highlighting the use of DuckDB format in Wasm for enhanced data visualization. Jacob explains how this approach offers significant compression and faster query performance, making data visualization more interactive. We also touch on the potential and limitations of replacing traditional BI tools with Mosaic, and where MotherDuck stands in the modern data stack landscape, especially for organizations that don't require the scale of BigQuery or Snowflake. Plus, get a sneak peek into the upcoming Small Data Conference in San Francisco on September 23rd, where we'll explore how small data solutions can address significant problems without relying on big data. Don't miss this episode packed with insights on DuckDB and MotherDuck innovations!

Small Data SF Signup
Discount Code: MATSON100

What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

0:00

Thank you for tuning into What’s New in Data, in this episode we have Jacob Matson, a well known SQLServer, dbt, and Excel practitioner who has joined MotherDuck as a Developer Advocate. Jacob, how are you doing today? Hey, I'm doing great. Thanks for, having me on. Jacob, this is your second time on the pod. Last time we spoke, you were in a very cool hands-on operator role. You're very well known on Twitter and LinkedIn as being one of the, leading experts on being operational with things like SQL Server DBT, Excel, ERPs, CRMs, you name it. And now you're in a new role. At, mother duck tell us why what, drew you in that direction? Yeah. I think, a lot of it came down to the shape of the problems that I was facing. And I kept just going back to the well for DuckDB. So it it turned into a thing where it was like, Oh, like I can use DuckDB for this. Like I have a problem that is in the. shape of data that I can run on my local machine, but also, not, so big that I need to use something like Snowflake or Spark. And so that just kept happening. And then, timing just worked out from, a standpoint of, What they're looking for and what I was looking for. And now I'm here and I get to I get to share about all the really cool stuff that, MotherDuck is enabling with the dual execution model, and running DuckDB both locally and in the cloud. Super exciting stuff. I'm a hundred percent with you. DuckDB is, seems like it's. Foundational technology for the next generation of analytical and all types of workloads. Honestly, we're going to dive into that into this podcast. And then also really exciting to learn about the things mother duck is doing to, support it. But before we get into that, I, want to understand and help the listeners understand as well why DuckDB exists and why it's special. Can you explain just what DuckDB is and how it works? Yeah, sure. So I think about DuckDB as SQLite for analytics. So it is an embeddable kind of in process database. So that means a lot of the things that you need to worry about when running a database in a distributed system or, multi tenancy you can forget about. You don't need to worry about, Security models, for example, it's all running locally. When you're running DuckDB, you it's an in process, it has access to what it has access to. And there's no such thing as role level security or role based access control or anything like that. So what that means is, broadly, it's a, simpler way to kind of reason about the data model because you're not thinking about things like you're really concerned in Postgres about like MVCC, right? Multi Version Concurrency Control. There's one, person reading a DuckDB database, so who cares? Those types of things are a little bit different. Obviously, there is like the ability to write and do updates and there's it is asset compliant. So there is some controls there, but it's not the same, level as something like Postgres. The other thing is it's built for multi core machines from the ground up. So one of the really cool things about What Hannes and the team at DuckDB Labs have built with DuckDB is, they recognize that, The next way that computers are scaling out from a compute standpoint is adding more cores. And that's different than the world, let's say, pre 2000, 2010. Where we would maybe have two cores or four cores on our machines. And when you're running, sending a spark job to the cloud or to your cluster at that time, it's, it's analogous to one core. Now we have machines, the machine in front of me right now has 14 cores. So if you just think about like just the math over the last 10 years, the math is, that well, now you have 14 times as much power as you did 10 years ago, assuming that you're, that you're clock speed stayed the same and it didn't. Of course, it's faster to so we have that, happening. Meanwhile, it requires a different way of thinking about your software to build it, to run in multi threads. That's something that for example, something like our does really well is run in a vectorized way. DuckDB has done the same, right? So when it's running and executing queries, it's breaking into parts that can be executed in parallel and combined back together. And because of that, you can get all of the advantages of maxing out all of your local cores. Because it's vectorized. And that is something that's unique to DuckDB and is a kind of a design feature intentionally in like transactional engines, right? Where they're thread bound typically, right? That doesn't mean they're always thread bound. But what that means is that basically when you write a query, it's running on a single thread. DuckDB says since I'm in process, I can use all the cores available to me going to just take all that data, suck it up and just jam it through. So that's what's unique about what DuckDB is doing is really that it's built from the ground up to execute on machines with, 10 plus cores and that architectural advantage serves itself, serves it really well in, from an analytical workload perspective. Absolutely, and this is what we were talking about when the fact that DuckDB is so optimized for the type of hardware in the metal that most people have now in their laptops or even very easy to deploy cloud instances it has a so many built-in advantages. And the best part about it's, so practical. I've used DuckDB myself and you were able to bake it into your operational workloads. So all those things and I think MotherDuck brands it as multiplayer analytics, right? Yeah. Yeah. And I can talk a little bit about what the difference is between MotherDuck and DuckDB. Does she want to go? Yeah let's get into that because I know MotherDuck Jordan Tagani and team are doing some excellent work there to make DuckDB, more scalable and better adopted by the community of people who can get value from it. So I'd love to hear what MotherDuck does. Yeah, sure. So we are, cloud service provider sitting on top basically operationalizing duck to be for the cloud. So DuckDB is built and maintained by DuckDB labs out of the Netherlands. Which is like a open source foundation, that, DuckDB is a part of. MotherDuck obviously is a close partner with them, but we do not DuckDB is not our project. DuckDB lives in the foundation and is stewarded by Hannes and Mark and others. So what we're doing is, we're saying, hey there's this really powerful kind of single node, database engine available. How do we make that work in the cloud? And so all the things I was talking about earlier that are benefits DuckDB become a challenge when working in the cloud, right? It uses all the resources available to it. It doesn't have a security model, so that means we have to build one. You have access to either the entire database or you don't for example, that's just how DuckDB works. There's which is very different from like traditional database where users will have specific tables. They'll have rights to like different actions on different tables. DuckDB has no considerations for those. So how do you make that work? How do you make things like incremental storage work? How do you, extend all the capabilities that exist? In DuckDB just make it easier and faster to use with other cloud services. And so that's what we're thinking about. The 2nd part is also your local machine is pretty powerful. There's even bigger machines in the cloud. And so we have this notion we call hybrid execution, where the query planner, when it runs, actually assesses where the data is meaning. Is it local or is it in the cloud and then can assess where to execute different parts of the query. Some of it can execute in the cloud. Some of it can execute locally. And so really what that means is you're maximizing your existing hardware. And that means it's also cheaper to run right instead of using easy to it is using a local instance. For example, your local, compute. And so there's a whole bunch of stuff around that. That's really cool. And the other piece that's semi related to this notion of dual execution is, we have, a WASM engine for MotherDuck too. So it can run in your browser. Actually, the MotherDuck app is powered by kind of our WASM connector. Which is really cool. That means that again, as you're interacting with it locally, it can run a bunch of queries locally and then get go to the cloud when it needs to get more ram or more data that's not available locally. Okay, that's excellent. That's actually 1 thing I want to drill into a bit. And so there's. Actually, a few things that sound very exciting. So with MotherDuck specifically, like you mentioned, there's a hybrid execution engine and it can tell, okay, is the data local to this machine or is it in the cloud? Are there parts we can execute locally? Are there parts we can execute in the cloud? Are you saying like I can do like a join? Between data locally in the cloud. Is that how that works? Yes, that is correct. Wow. Okay. That's, pretty powerful. And it, really does seem like for a lot of the analytical workloads, you don't always need a spark cluster. But that's because of the way the data is laid out and organized. And okay, I have my data warehouse vendor, or I have my lake house and that thing by default is going to run these big expensive queries. That seems to be what we standardize it on as an industry, not because it's the best way to do it but, because it's convenient. Now it seems like there's an even more convenient and more efficient way to do this. Yeah, that's fair. That's certainly our angle as we want to be the easiest to use, but also the most cost effective. It's a hard, line to walk for sure. And I think, there's a lot of ground yet to be explored in terms of, how you can scale out DuckDB one thing that's interesting if you compare it to other cloud vendors is, you don't need to operate a lot of those with kind of the concern of optimization in mind, and you don't need to necessarily with DuckDB either. But what's interesting is that users tend to think about that more because they're using, duff db, which is a really cool kind of synergy and offering too, because you end up with users that love to think about optimization. And so it's a really nice feedback loop where it's like, Hey, like I had someone the other day that was we were talking to dbt slack. Really cool op thing where they were actually running their pipeline using only local compute. And then their last step in their dbt pipeline was cloning that database into mother duck, right? Just with copy database, the copy database function. So that's a really cool way that you can think about using it that I hadn't even thought about just because our users tend to be optimizers. There's a lot of really cool stuff where it's like, Oh, we have this we can run our pipeline all locally. And then last step is we publish the file basically into mother duck. So if you think about what that pattern looks like for, I don't know if you're a data app vendor and you have you want to be creating analytics that are really fast for your customers. You need to like run this pipeline on a hourly or daily basis. How do you, achieve that? At scale is that one, one option is you can just run a bunch of stuff locally and then just ship that, into the cloud for execution, with our Wasm engine, which is really cool too. So the WASM engine and WASM stands for WebAssembly. That means it runs within the browser. So is the browser itself instantiating DuckDB, the entire database? Yeah, it does. It does. Like in my MDS in a box project, find it at mdsinabox. com. That is using just regular DuckDB WASM. But you can also MotherDuck has its own kind of fork of that. That supports the additional mother duck functionality. But yeah, so I don't think it can use, I think it's limited to half the Ram and half the cores are single threaded maybe. So it doesn't have all the advantages of duck DB kind of running on bare metal. But even then it's really fast and can do some really cool stuff. Especially my, my favorite example of it is, the mosaic project. Which is a, a visualization library built by some of the guys from, Streamlit and Vega and others. And it behind the scenes on it is is, DuckDB Wasm. And so what that means is you can load let's say for example, 20 million, a 20 million row data set and get a millisecond level latency interacting with it. I think they're what I've heard from them is their goal is 60 FPS as a bare minimum in terms of interactivity with the visuals they're building. And if you think about how you make that actually happen, you can't have a round trip to a database, right? Meaning like a database in the cloud. It needs to all be local. And so how do you bring that power? To, to a local experience. The answer is wasm is one path. And I'm very excited about what, type of scenarios that makes available for sure. I remember when I was software engineer working on business intelligence, specifically dashboarding, the biggest problem wasn't just building a nice chart. It was actually managing the data from the database, right? Because if you let a user go in and put in some sequel, they're gonna, The browser is going to try to get this big JSON response of all the data points, and it can throttle and choke the browser with not that much data, honestly, because there's other inefficiencies there. Now what you're saying, and it sounds like the Mosaic project, Is, leading this is a more efficient kind of caching and layer for rendering purposes. Yeah. Yeah, definitely. Yeah. So, I think there's, two parts in there and then you actually hit on something that I want to talk about motherducks, tenancy model. Which, is unique. And I think, in terms of like database pressure is one of the challenges, the analytical workloads, but I'll come back to that in a second. So if you think about if you're using, I don't know let's say D3 for visualization, right? You're probably pulling JSON over the wire. And then populating charts with that, right? And that's fine. JSON is great format, right? But if you can use the DuckDB format, which is in WASM, you get like 10x compression. So that means that what is 100 meg of JSON can be like 10 meg of DuckDB, or more likely what it means is instead of you having, being limited to 100 megs hey, I can only put 10, 000 rows in this visualization. It's I can now I can get 10x the rows just because of one dimension, which is compression. Then you have a faster query engine, right? Instead of using json or instead of using javascript primitives to interact with the data, you can use sequel primitives, which are going to be faster because they're based on DuckDB, which is optimized for query speed. And so you add up all these little advantages, and all of a sudden you can actually start to get to the interactivity that is For all intents and purposes real time. And I think, that is, some of the advantages there. Like I said, one of my favorite data sets in the Mosaic, in the Mosaic kind of demo is a 20 million, 20 million row flight data set of like flights and if they were on time and And that kind of stuff. And you can just drag around, interact with it. And you see the data come back in real time. It is, it's crazy. So I'm going to, I'm going to ask you a very pointed question and you're not allowed to say it depends. Okay. So everyone just be aware there's that qualification here. Can I replace my BI tool with this? Can you replace your BI tool, like with the mosaic stuff? Probably not. Yet. Okay. So there's, a couple of, there's a couple of vendors who, so the evidence dev folks are using DuckDB awesome today with their BI tools for interactivity. And I believe observable is as well. And there might be, others. I know like it's in, hex and mode, like the DuckDB pieces of it as the, doing some various SQL bits. I think the main use case for the wasm stuff is actually more like if you're building a data application, instead of having to, let's say embed a BI tool like Looker, you can roll your own and build a best in class experience using mother duck wasm. That's incredible. So let's say I'm, a, I'm, I have a data stack, right? A modern data stack. And you know this very well, cause you, you built MDS in a box. But let's say I'm a data team we have ingest. We bring it all into a data warehouse like BigQuery or Snowflake and we have Looker reports sitting on top of this. Where does MotherDuck come into the equation here? Is it adding to this? Is it replacing parts of this? That's what I want to hear. Yeah, I think the positioning we're thinking about this is you don't actually have that much data. And so for folks that really need BigQuery and Snowflake, they're always going to need it. But we think that the majority of the market actually doesn't have that much and that you can use MotherDuck as the data warehouse. And I think what that looks like in the in the kind of short to medium term is I think it's probably definitely a and question hey we're gonna use big query for X. And maybe our marketing team is going to use mother duck. But I think notionally, where we want to go to is hey whatever your data workloads are, we want, you to be able to bring those into mother duck. Yeah, absolutely. And your, point about most people don't have really big data seems to be corroborated with multiple, benchmarks and publications on this, a really well known one was some of the data released by AWS Redshift where the, majority of workloads are not at petabytes scale, right there in the low terabytes. And it is worth mentioning that, yeah, maintaining a data warehouse, there's, that built in floor costs, and that comes from the scale that it's, that, that warehouse is supposed to offer, but most people, may not have that. So and now let's talk about the actual, like to make this a bit real, Let's say I'm storing a few terabytes of data in BigQuery. Do I actually need big data processing to, to use that data? At the scale of a few terabytes I don't think so. The answer, of course, depends on, requirements, right? How much of that are you really going to process at any given time? What are your partition strategies look like? The nice thing about tools like snowflake and big queries, you can ignore that stuff, right? Yeah, I'm just gonna put a bunch of crap in there and then maybe I'll use it later. And if I query it, it can spin up a huge a huge instance to handle it rapidly. But on the flip side is that really what good data stewardship looks like? Do you want to have that much data lying around? Should you be able to query it that way? I don't know. I would say no. Actually, I'm not gonna say I don't know. No, you should be more intentional about it. But also a few terabytes. Yeah, no problem. No problem. Mother duck. You do. You do have to think a little bit more about your partitioning strategy. You just can't like jam 10 terabytes in there. But, overall, I've been very pleased with the kind of the ground we've been taking in terms of executing against potentially large loads. And the other thing you have to think about, too, is DuckDB has best in class compression. 1 terabyte on, in a parquet file is not the same as one terabyte inside of the DuckDB. And so that's something to think about, too. Or I was talking about 10 to 1 compression earlier, right? That's probably not exactly right, but most JSON probably compresses around that ratio. So just think about if you're dealing with I don't know, JSON blobs in, S3 or something, right? Hey, thanks. You can You could have, I don't know, a few terabytes of those and then you can just compress them and now all of a sudden it's a few gigs. I've seen that, I've seen that quite a few times, with JSON less, less compressed formats, Part of So DuckDB has its own native format in addition to Parquet, is what you're saying? Yeah, so the DuckDB storage format is incredible. Yeah. Okay. Has its own serialization format and is the way to get data into that format, is that just inserting directly into the DuckDB via the API that it, like the actual DML API that it has. Yeah. So yes, that's right. So like with MotherDuck, it's as simple as putting MD colon. And then your database name to connect to it. And then you can just, yeah, use regular TML operations from parquet files, CSV, Jason arrow tables, data frames. There's a whole it has a whole kind of set of APIs out of the box that are all pretty fast to load data. And you very rapidly just start running into networking constraints on just moving the data from A to B. Which is really where you want to be on that stuff is if you can hit the networking threshold, then you pretty much saturated. You know what you can expect to do. Yeah. And so there's lots of cool stuff there. We have a Fivetran connector, for example, as well. One for air bite and some others. DLT hub has a good one, too. Getting data in is it's easy, but could be easier. Yeah. Yeah. And we're hoping to continue to. Push an envelope on what it looks like to have best in class performance in that area. Yeah, absolutely. The, other thing that's, very interesting is, yeah, you could have your warehouse, right? Where you store all your data long term, but there's a lot of ad hoc analytics jobs that you can potentially do with DuckDB. So how do you see companies balancing what's in their warehouse versus what they do in DuckDB? Yeah. I think ideally they're thinking about those as one in the same, with MotherDuck. But I think one thing that's really interesting that we have built out. So we use, MotherDuck for our own data warehouse at MotherDuck. One thing we've built out, which is really cool is we actually have a sample to local script we can use for development. So what that means is you can run a script and you can get, a thin copy of the entire data set, which means you can do true local development for very low cost. Obviously, you're just paying for egress at that point, which is really cool. And that's something you can't really do with with other databases, at least not in an easy way. We've all been in the state of oh, we need to restore backup so that our dev environment like can be, we can run and not get a failure on whatever this test is. That's really hard. But obviously DuckDB opens up some new paradigms where you can sample to local, which is really cool. In fact, actually there's, a really cool open source package that just, I just saw the other day, which someone actually built that for snowflake so that you can you can basically proxy your snowflake with duck DB. I think I saw that actually. Yeah, that's pretty cool. And this also comes back to your point about NBCC, right? So like, why does NBCC matter? Because you don't want a lot of analytical users going and running. Like OLAP style queries on Postgres while it's also serving the transactional workloads, because ultimately they're both going to be competing for resources and the analytical queries aren't super efficient. So now DuckDB is its own super efficient compute layer. So we'd love to hear how that's how, DuckDB adds value there too. Yeah. So one thing that's really cool that we're doing a mother duck is, every user gets their own duckling. Okay. So what that duckling is our kind of notion of bm compute. So what that means is if I have a data warehouse and then I'm connected to it as Jacob and you're connected to it as john. our queries are sandbox completely away from each other. So if I run some really dumb query doing a bunch of window queries or something super inefficient, your performance is not affected. And so that's something that's unique. Our tenancy model there where that really matters is if you think about you're serving a data app use case, you could potentially have hundreds or thousands of concurrent users. That could be a pretty big snowflake bill if you let them have an interactive session with, with their data. So if you think about what that looks like in the context of mother duck wasm you can feed like a data set, I don't know, a few million rows into a user, and now they're completely sandboxed off from other users. So you can break the dependency there, and let them run whatever crazy SQL they want to run give them the real power. Of the platform you're building without also the risk of, oh, this someone's running a query. That's made my whole application slow, right? My previous company was in telco and I have definitely I definitely got my hand slapped a few times by some pretty big vendors when we ran some expensive queries on them. That is definitely something to think about from a data app standpoint, where how do how do you let users have the power they need without, without, sorry, we're here impacting the performance of the system as a whole, that's part of, that's part of why. We ended up with rest APIs and you can't web scrape and all this stuff just with how, the front end works at the back end. Ultimately a lot of it is just to prevent users from being able to just DDoS their own server. And so I think that's something where it's we welcome that type of workload actually, bring it into Wasm and the only person you impact is your, Yeah, that when you look at a lot of the classic architectures that are proven to work for balancing transactions and, analytics really, it usually comes down to a couple of patterns. One is change data capture from the operational database to the data lake you're offloading data for, reporting purposes. And then you also have concepts like read replicas, which is essentially another form of changing to capture just to a copy of the database. Now I think one of the things that's super interesting is, you still might have this notion of changing to capture, but you could have H tap. Where majority of the compute is taken from this super efficient embeddable OLAP engine, that's reading from the database or a copy of the database. So that I do think that there's some fundamental, patterns that it's, building on top of, which could be very cool and exciting for people who want to, to roll their own each tap. Yeah. I think definitely, HTAP is notionally a really cool thing to attempt to achieve. I still think the reality is. You have to replicate the data somewhere else to achieve the performance you want, whether that's with CDC or just like selecting data into another table or column store indexes on a row store table, that's how SQL server does it. For example or Oracle as well yeah. Like some way the data is being, it's great. We've duplicated all of the data into a column store index. It's okay. You're gonna pay for it one way or another. Yeah, It's gonna be the most convenient and, efficient way to do it. So yeah, I think what, so like when I'm really, and I think this leads into something we were gonna talk about anyway, but we're, might as well bring it here. Obviously we're partnering with Hydra and others to bring PG DuckDB to Life, which is a, Postgres extension to let you run DuckDB inside your Postgres server. And then obviously extend it to mother duck if you need to. And so that's in development now the project is public. You can take a look at it. I think that is, the closest thing that I've seen to open source HTAP. But what's not, I don't know what the technical marketing is that we're going with here yet. But it's definitely, it's fast analytics, whether you want to call it HTAP or not you might guess. Yeah, it's, it's, certainly very indicative of the fact that this is the next generation of analytics and computing that can be solved with DuckDB as a foundational element. And then on top of that Apache Arrow and, these, popular serialization formats, I know ADBC is also something that DuckDB is aligning itself to which is, also super cool. And, becoming a standard in its own way, like what you're seeing with DLT hub and, and, data fusion and others. A lot of really what I think is groundbreaking technology coming through this and it really did peak everyone's interests when Jacob, you as an operator, someone who's actually. Kept the lights on in a, real world environment with data, spreadsheets, SQL server, dbt, what have you looked at duck DB, looked at mother duck and said, Oh, wow, this is actually something that can be useful to everyone. Yeah. That's that's the goal, right? I think again, my personal experience was just kept running into problems that were duck db sized, right? In the range of, I've got one of my favorite was at a previous company. I was, we would get, excel files that were partitioned by tab. For example, so like, how do you do, how do you deal with that? And the answer is, people have like data analysts that were just like running their head against the wall, trying to handle it, with, pivot tables or power query and all this kind of works. Okay. But with DuckDB, it's very trivial to rip that into a tabular format that is then super, super fast to query. We're talking about going from I don't know, probably like a couple hours screwing around in Excel to, for SQL statements that were able to get what we were trying to get to out of these files. And it took 30 seconds. Yeah. And the fact that you can express your transformations very concisely in SQL on top of fairly complex file formats. Obviously my one of my favorite kind of minutia of, of dot db is that the excel reader is actually part of the spatial package. So you have to install the spatial package for it to work. And there's a separate excel, there's a separate excel plug in that does something completely different. I think it's, I can't even remember what it does, but, yeah, so using the spatial package you can, do lots of fun things with excel, handle files that are partitioned. Yeah. My tab. Yeah. Yeah, it's, anytime you can make SQL really, portable and embeddable, there's gonna be tons of value you can add. And now, the reality is Tons of analytics and operational work is still done in spreadsheets, and that'll never change. Yep. The question is, how do you scale that and make it more, rather than everyone having like their own ad hoc copy version of a spreadsheet? How do we actually apply some of the fundamentals we know about data management to scaling that? I think there's, I don't know if the story is super clear yet, but I think there is a real path now with with DuckDB to making that possible. Yeah. I think, that. For better or for worse, like the lingua franca of data has settled on SQL much to the chagrin of DataFrame enthusiasts everywhere. But, I think that it's a real power to be able to think about, especially as companies are more and more spread out and they're smaller and smaller, like collaboration is critical and being being aligned is critical and having tools that enable us to do those things are critical. And so if we can just make it easy to have one source of truth, and, make that the easiest thing, make that as easy as a spreadsheet. I think that's some of the real power, of duct TV is that it, takes that same, runs locally, just like Excel, it can read all the same files as Excel and it gives you really powerful primitives to operate it on it quickly. So I think like for me that's, consumed a lot of workloads where I previously, we'll screw around in Excel or using power query. It's like now it's just easier to duck to be it because it's, it's, equal. And then I can run it later if I need to. And I think, that there is, there's real power there. But a lot of that is organizational, but, I wanted to ask you about, Small Data SF the event you have coming up in September 23rd, like I said, in San Francisco. Yeah, so we are hosting the first Small Data Conference. Very excited a lot of really cool a lot of really cool speakers, talking there, I think my favorite is, one of the overarching themes that we've been able to think nail is like, what does it mean to make some of this AI stuff usable in practice, right? So folks like, oh, Lama gonna be talking about that. We just released our own kind of embedding, embedding feature. Which I'm really excited about, that's enabling some, kind of fun use cases, but I think actually can turn real as well. And we're, excited to spread the word there about Hey, you don't have to have big data to solve big problems. And that's, I think, the overarching, goal of what we're doing with small data. Great, We'll have that link to sign up to small data the small data conference in San Francisco in the show notes with Jacob's code. Be sure to use that. So that'll be down in the show notes and description. Get a discount with my code. Indeed. We'll put that in the show notes. Yeah Yeah. definitely looks like a great event there. I did see the speaker lineup. Certainly going to be very exciting for people who want to the way I really think about DuckDB is that if you want to do analytics. And you don't have to worry about getting throttled by your warehouse or adding unnecessary compute. It's really amazing for just ad hoc analytics right now. And it does sound like mother duck has some really interesting value add to standardize and centralize that within an organization, even if you already have, a data warehouse. Cause yeah, sure. Data warehouse is great. Restoring tons and tons of data. And if one day your CEO asks, Hey, I want to know the age of every single customer we've ever had, then sure. Go run that against the warehouse and, all scan all petabytes of data you have. For most workloads like that, that report that sales is asking for, or that marketing information that your rev ops team is asking for, it can fit in a small amount of, in. Either on a single machine and can be much more efficient. If you just do that with duck DB. Yep. Yeah. That's the goal. And I think, that the number that we're going to be able to see that it's processed by a single machine is only going to increase over time. And it's crazy to reflect on. Where things were 10 years ago in terms of what it meant to have big data versus what it means today. Yeah, I think, if I had known where we would be in in 2024 when I, in 2014, when I was working up with my finance team on procuring more SQL server core licenses I think it would have like totally blown my mind. It's still does actually, but yeah, it's crazy. Crazy. Jacob Matson, developer advocate at mother duck. And you can see him in person at small data SF on September 23rd. Jacob, thanks so much for joining today's episode of what's new in data, and we'll be hearing from you soon. And thank you to everyone who tuned in today. Thank you. All right. Thanks, John. We'll chat later. I'm You