What's New In Data

From Apache Kafka to PostgreSQL, PostgreSQL maturity and extensions, and building on PostgreSQL with Gwen Shapira (CPO at Nile)

Striim

What does it take to go from leading Kafka development at Confluent to becoming a key figure in the PostgreSQL world? Join us as we talk with Gwen Shapira, co-founder and chief product officer at Nile, about her transition from cloud-native technologies to the vibrant PostgreSQL community. Gwen shares her journey, including the shift from conferences like O'Reilly Strata to PostgresConf and JavaScript events, and how the Postgres community is evolving with tools like Discord that keep it both grounded and dynamic.

We dive into the latest developments in PostgreSQL, like hypothetical indexes that enable performance tuning without affecting live environments, and the growing importance of SSL for secure database connections in cloud settings. Plus, we explore the potential of integrating PostgreSQL with Apache Arrow and Parquet, signaling new possibilities for data processing and storage.

At the intersection of AI and PostgreSQL, we examine how companies are using vector embeddings in Postgres to meet modern AI demands, balancing specialized vector stores with integrated solutions. Gwen also shares insights from her work at Nile, highlighting how PostgreSQL’s flexibility supports SaaS applications across diverse customer needs, making it a top choice for enterprises of all sizes.

Follow Gwen on:

What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

I'm You Hello, everybody. Thank you for tuning in to this episode of what's new in data. I'm really excited about our guests today. We have Gwen Shapira, who's co founder and chief product officer at Nile. She was previously an engineering leader for cloud native Kafka at Confluent. Gwen, how are you doing today? Really happy to be here. We've been chatting about this on off for a while and I'm glad to finally be talking to you. Absolutely. Yeah. Gwen, we've gone back and forth over the years communicating over all the latest, greatest tech in, the enterprise and open source. The first time I saw you. Give a talk was six years ago, probably about exactly six years ago at O'Reilly Strata event in New York. You had hundreds of people in the audience listening to you talk about microservices and the future of ETL with open source and Kafka. So great to be catching up with you here on this podcast now. Really great to be catching up with you. And I have to say, I miss O'Reilly Strata a lot as a conference. There are other good data conferences these days, but I feel like Strata was special. It wasn't a vendor specific. It was really unique. It was unique. It was very cool. It was in Javits Center, which I always thought was a really electrifying, exciting place to be. The practitioners love. Going there to talk about what they built and it wasn't super vendor heavy the vendor participation was always really good and pointed because you had to talk about how you add value into the open source ecosystems because it was it was always very tied to the Hadoop infrastructure in the beginning. So it always had that. So I agree. I do miss that conference. What are some of your favorite conferences today? Yeah, I have to say my conferences has changed a lot because I'm now like part of the Postgres ecosystem. Pg dev conf up in Vancouver a few months back was really amazing. And then, and because a lot of our customers and users are JavaScript developers. I've started going to JavaScript conferences and the ecosystem around that, which is also very different and the entire vibe is very different. So yeah, I don't know if the conference I go to these days are the most relevant for the Data people in your audience a postgres Conf is definitely a great experience if you are in that ecosystem And the other thing is of course like the classics the qcon and goto like the big Data or architecture conferences that they always have a data track, but they talk about software architectures in a larger way and This is where I got my excitement about microservices, for example. Absolutely. And you've done great work as a engineering leader for cloud hosted open source infrastructure for a long time. So not surprising to hear that now you're branching into JavaScript conferences and Postgres conferences right now. Nile, the company that you co founded is doing some incredible work in the Postgres community. Tell me a bit about the current state of Postgres, what people are talking about there, what's being announced, what are people building with it? Yeah, so this is really interesting. So one thing that everyone knows, but it's hard to internalize until you join the community, is that Postgres has been around for a very long time. If you look at the source code, you can find a lot of if you do get blame, this line was added 25 years ago, 30 years ago, before, definitely before I was working on anything at all. So it, and a lot of the people who made those commits 25 years ago, they're still reviewing patches. To this day, they are working employed to the company. Like they're not even retired and they're just still part of the community. So you have a real sense of tradition. I would say in the community and you also. Can see how the community is trying to keep itself modern life. So they have a discord now. It is this is probably a big change for people who've been working with the same community and same people for 30 years, long before discord existed. And so it's really fun to see community maintain a lot of traditions and also evolve. It's also really interesting to see. If you look at any particular. Contribution any particular change that people are getting in. It looks really slow. It, I think a bug fix that everyone agrees has to be fixed, but is non trivial to implement can take three, four months to get in a feature that may be slightly more controversial. It can be years. And so you look at the PR and you say, Oh God, there's this feature and Oracle had it forever, and they've been just discussing it for three years. Will it ever get in? But I think in a way it misses the bigger picture because when you look at releases and Postgres has really good process you know exactly to the day a year in advance years in advance when each release is going to happen And in the few months running up to the release, you know what features are very likely going to make it or not. So it's probably the most predictable open source project I've ever seen in my life. Apparently they were only late for release. Once in a few days because they found a very critical bug in something that was very hard to leave out of the release, something along those lines and it took them extra few days to completely rewrite something in order to make it in. But in, in general, it's been excellent. It's extremely reliable. And then when you look at every release, there is always good features. Now, Postgres is very large and not everyone needs every new features that makes it in. So for example, one of the most exciting features for this new release, this is Postgres 17. So it was released like a few weeks back. It's still very new. So the most exciting features are that people got crazy over. is an optimization for queries that involve basically filtering on a list. So where value in the column is in and a list of few values. It used to be that an index had to be scanned every time for each value in the list, which can get bad if the list is full. Giant. And now it can do one scan and find everything. So this is a cool optimization. Everyone is very excited about it. But of course, if you look at the large number of workloads, not everyone uses in, or even if you use in, not it doesn't like would you really go through the upgrade just to get an in optimization? Is that like the biggest bottleneck in your application? Whether or not you feel pressure to get those new features really depends on your specific workload, which is why you sometimes see people running years old, like five, six years old, a Postgres in production with no problems at all. And if you use partitions, don't do it. Partitions are still, got a lot better in the last few releases. But, yeah, like a lot of workloads. Will work well on Postgres 12. I wouldn't recommend running Kafka from five years ago in production. And I think some newer projects evolve much faster, but Postgres is depending on what you do, you can be fine with older versions, I would say. Absolutely. And that's so much of. Development that goes into databases is just continued optimizations. And there's a lot of very sophisticated work that goes into optimizing information storage and retrieval and different types of indexes, and even using like probabilistic data structures and things like that. It's a lot of development, which, and then at the same time, it has to. Remain resilient. It has to be backwards compatible. All these things in Postgres is amazing. So two ways that Postgres testing is amazing. First, there was the very famous backdoors that almost made it to SSH. I think a few months back, this was discovered by one of the core Postgres engineers, Andrea's friend, on while he was testing performance of Postgres. And he noticed that his tests were getting slow because SSH ing to the machine, SSH process was taking more CPU and was Taking slightly longer. And this looked weird to him, so he investigated. So it's clear that there is a lot of detail orientedness that goes into, there cannot be any performance regression in Postgres ever. And the other cool thing, people these days talk a lot about the deterministic simulation testing. And. Postgres had something very similar to test transaction isolation for many, years, where they basically, you write some scenarios, the transactions do things, and then they shuffle them, and they make sure that all the shuffling stays. Still maintain the same invariance and you get the same results, no matter how you move things around, which basically tests that you have transaction isolation. And and then at Nile, we actually used the fact that those tests exist to test some of our transactional guarantees because we do some distributed transactions that are not part of normal Postgres, but we could still reuse their tests. It's pretty cool that, it was just so much ahead of its time. Yeah, absolutely incredible. I think that's why it's taking a lead in community adoption as the open source database of choice for a lot of both data infrastructure companies and then companies are offering databases as a service to enterprise and it's super well adopted. There's no debating that. So it's great to hear the your perspective from in the weeds of working with the Postgres development community. And yeah it's, great context to know that there's so much yes, there's rich history. Databases have to be, in my opinion, developed over 20 to 30 years. And there's a popular hacker news post about Oracle's code base it's this massive 25 million lines of code C and there's former Oracle developers allegedly who chimed in and commented saying getting one feature in there just took such a long time. And I think that's the only. Way to have a database. If someone comes in and says, Hey, I built a database from scratch in the last six months, and with AI or something like that they, they code, they generated it with, that's a big red flag, And now I want to try Zoho, go through chatGPT and ask him to generate a database for me and see what happens. Yes. Yes. The if, chatGPT is smart, it'll decline your request. Or something like that. Exactly, like it'll say, no, this is really complicated. Yeah. Then we'll know that AGI is here, but if it actually tries to generate it in Python or something there's problems. Yeah. Sam Altman has some work to do. But that, that, that's great to to hear the latest there. And yeah, curious to hear what else is going on with Postgres. Yeah. So the other thing is because as you said, the community is very careful about adopting features into core Postgres. A lot of the interesting, exciting things are in the ecosystem with people publishing what are called Postgres extensions. For all kinds of scenarios. So while I was at the Postgres conference, I chatted to a really nice person who is building a Postgres extension that adds support for Arrow. And there are multiple extensions with A support for per key file formats. So if you put those together, suddenly you, there is some big data operations that just became a lot more efficient in Postgres. So this is pretty exciting and no one had to wait on Postgres to decide whether they want or need a new file format or a new memory format or anything along those lines, because you can just do so much with extensions. And then you have extensions that have been around for a long time and are still generating a lot of excitement. So one, one of them is that I just kept being mentioned as an example in the conference in all kinds of situations. It's a HypoPG. It basically allows you to create hypothetical indexes without having to Spend the time on resources of actually creating and also risking production repercussions for actually having them. And then just check, how will this query behave if I had this index or more likely, you have a workload that mixes, inserts, updates, selects, all that. How will it be, will this workload be faster or slower if I had. Those other indexes. So it's, yeah, I think, I don't know database people are slightly obsessed with performance and optimization. It's a big part of what we care about. So this is one of the things that we keep coming back to. There are cool optimizations going on the network layer optimizations for making SSL more efficient, which is. Really important. It used to be with Postgres. I don't know if it exactly predated SSL, but Not only did it originally not support SSL for its first few years because it was really one of the earliest databases that existed. For a long time, nobody really thought SSL was all that important. It all ran in the data center. Do I really need SSL between every machine and my data center? And I think the security positioning of most companies in the last 10 years shifted to a point where, yes, of course you're going to run SSL between everything. And also, of course, with all the Postgres as a service, a lot of time you actually access Postgres beyond firewall walls. So you obviously need a much higher level of security and nobody's actually willing to trade off security for performance. We want all of it. So having better SSL support in the protocol and more efficient one is obviously important. Yeah. That's a great point you brought up and yeah, I'd love to dive into each one of those, but most if we're just going Last in first out the point about SSL. Yes. And I very important that security teams are now standardizing that for making sure that Postgres connections are always secure over networks, I've worked with enterprise security teams that just don't allow for instance, like inbound network connections to an on premise data center, because they know that data centers are You know, maybe 20, 30, 40 years old. And for all they know, there's a bunch of databases with one, two, three, four, five is the password and no, no network leveling and like none of that stuff. No patching for no security patches applied for the last five years. For sure. So now everyone's assuming, okay you're going to get. Tons of requests coming in over networks from cloud or maybe another data center, most likely cloud. Now, it's not super controversial to have your enterprise data center in the cloud as controversial as it was 5 to 10 years ago. Yeah, absolutely critical that becomes a first classes and very performant. For, databases. And the other interesting one, which I want to actually learn more about, cause I'm always curious about this is you mentioned there was an extension for Apache arrow which is and for the listeners, it's an in memory column or Format that's used in a lot of, what's the most popular usage of arrow I'm, thinking is probably the most popular. Yes, absolutely. It makes operations on in memory columnar data structure, super fast adopted by a lot of very popular open source and enterprise data lake platforms. And now that's becoming an extension for Postgres. So how would that work exactly? Is it actually going to absorb arrow format into the Postgres buffer cache? Or is it going to be something that sits outside of Postgres? I believe it's outside the buffer cache. So Postgres has the buffer caches, the shared memory area, and then every. It's called background process, but it's essentially a process that represents your connection to a Postgres. It has its own memory area for things that are private, if I'm doing a large sort and I want to do it in memory, it's my sort, it's nobody else's, it will get, those bytes into my area of the memory. And so my understanding is that the arrow is going to be in the background process memory not in something shared. And my understanding is that it is Something that you basically configure on specific tables and in specific scenarios. It's not something that you just get every table in your arrow because it doesn't really make sense. It's optimized for specific types of data, specific tasks and so on. It really remains to be seen when I met someone and we just talked about it. He said he has an extension. I admit, I haven't tested it. I don't know. When is it cool and how cool is it? So if they're doing it and you heard about it, it's probably cool. But that's my bar for cool projects right now. And the other one you mentioned was Parquet, which is also Super interesting parquet, of course, being a file format and used by things like iceberg Apache iceberg and Delta, and also becoming a standard for bottomless cloud storage. You can put tons of parquet files on S3 or GCS your cloud of choice or on premise. So is the idea there that Postgres can query parquet files? Yeah, this is exactly the goal. And so I think maybe a year back, I talked to someone he was building an application and he wanted to use Nile and his application was. Basically required processing a lot of documents behind like offline, essentially you would get PDFs of catalogs from all sorts of vendors like furnitures and all kinds of things like that. And would basically would process it and extract information out of that. And he will use the Databricks for this information extraction. But then he wanted the data in Postgres to serve it. Now, Databricks, when it just saves data natively, it has its own, I think, Delta format. But it can write Parquet files very easily. And so he was like, okay, how do I get the data from those Parquet files that my parquet. Parquet. Databricks process gives me into Niles so I can actually serve it to my customers and run some queries on it. So we did some research together because I did not know either, and we found a lot of different ways of doing that. So there is, and extensions can be the Wild West. And as you can imagine, Postgres is popular, Parquet is popular. There will be a lot of different ways. Some of them are more suited for ETL, which is what we ended up doing. Working with, but some would basically open the connection and allow you to directly query a project. So it's, this is really cool, right? Yeah, absolutely. It's, it is really becoming a form of interoperability between the core components of cloud and SAS infrastructure. And it's great to hear that Nile, Users and customers are ahead of the curve there and how they're adopting it. So that's great. I do see that as being a parquet and postgres as being very foundational components there and pieces of the stack and a big part of The drive and investment going into an investment both from I'm talking not so much like venture capitalists, but more so enterprise investment now is modernizing for AI use cases, right? Generative AI is a very popular, trend right now, but I do think that based on what I'm seeing, the cons, the conversations I'm having, it's top of mind for many enterprises to, to implement and fuse into their operations and customer experiences. How, is Postgres supporting AI workloads? Yeah, so basically. AI workloads is a very wide term, there's so many different models. Obviously, what comes to everyone's minds when we talk about AI is the language models, but they're definitely very far from being the only models out there. And there's a lot of models around the Converting images to images, just as an example, there are not language models are transformers off a different sort. But everyone thinks about language models, and then within language models, one of the biggest subsets, especially for the enterprise is what is known as RAG, and that's retrieval argumented generation. The idea is that if you ask him a question without a lot of context it. Is quite likely to hallucinate or answer something that is out of band, not really relevant for what you need is. But if you provide it relevant information, it can combine the fact that it. Has a lot of knowledge about the world from studying the entire internet. It understands nuances of the language, both in input and in output. But it now also has some relevant information for your question, for your use case. Now the problem is how do we find relevant information to give the AI with my question? And this is where vector embeddings come in. Vector embeddings is a different type of AI model, essentially, that knows how to convert text to some kind of set of a lot of numbers that represent something about the meaning of the text. And then if you take two texts that turn into vector and you say, How close are they to each other? It's vectors. You have a distance function. How close are they to each other? It represents how much the meaning is similar, which is a form of how relevant they are to each other. And semantic meaning is different from the old text search rules to know based on word frequency and dictionaries, because it captures something about the meaning of a word in a context, in the domain. So function and procedure are very similar if you're talking about a programming language. They're very different from each other in almost any other context in the world. If you talk to some, to a lawyer and tell him that function and procedure is the same thing, he will think you are out of your mind. Obviously procedures is something very specific in law and functions are completely unrelated. So you get this ability to Basically vectorize the user question and use that to find documents that have similar semantic meaning to it, meaning more relevant. And then you feed it to AI and get an answer. So databases have now need to be able to not just query, select by a value in a column or something, they have to do vector search. Find me the Most similar vector to this one known as nearest neighbor search because you imagine this multi dimensional space and you're looking for the vectors that are closest to the one I just gave you. So it's the nearest neighbor search. This is also, this is what vector stores specialize in. And there are five gazillion of them out there. For obvious reasons, because everyone is going to need one, especially enterprises. And if it's faster, it's generally better. So you're going to want to take a slice of this pie. Now, there is a lot of specialized vector stores, and Postgres has, just like it can do Parquet, it has an extension called pgVector that specializes in vector search. Now, Postgres is not the only relational database, or even non relational database. It's not the only thing that suddenly grew an extension that does vectors. I think MySQL, Oracle, MongoDB, Datastacks, Databricks, I don't even know who else is doing it. Everyone is doing vectors. You cannot really have a database without vectors these days. Absolutely, yeah. A person who needs to build this architecture basically faces two decisions. The first is, do I want a specialized vector store, or do I want a vector store in, that is part of another database? And I honestly, I cannot actually advise a lot on that. I haven't tested a lot of the other vector stores. I would say the default thing to keep using your, the tech stack you're already familiar and comfortable with until proven otherwise is not a bad idea. So if you already have MongoDB, try using MongoDB, see how it goes. If you have Postgres, try using Postgres. See how's that work. If you have two, let's say you use Postgres for your OLTP and BigQuery for your data warehouse, then you start thinking about what kinds of data do I need to combine together. So a lot of time, the use of vectors and AI is part of an OLTP flow. It is part of your real time application, user logs in, looks at some of their data. Let's say that you are Stripe, and they look at their latest revenue statements, and now they have a question. Something seems off. And Then they ask, you have AI capabilities, because that's what everyone is about these days. So the customer now asks, can you please explain this data point to me? Now, what is this data point? Because this is going to be the most relevant context, and you're going to want to include it in the question, and you also want to find other relevant information based on exactly what is this. You're already likely connected to an OLTP database, and You know what was the last thing that you retrieved that the user is seeing. To build in, RAG into that, Is Very straightforward because your vector will exist right there in the database with the rest of the data that the customer is currently looking at. If the AI comes back with an answer that ref references them other data, it's very easy to pull it out and include it in the response it's already there. Now, if you put the vectors in the data warehouse. First you need the online application to suddenly connect to the data warehouse and then tell the data warehouse people to not take the data warehouse down anymore. Definitely not every night between those hours that they're used to taking it down for data loading or whatever they do at night. So it gets a bit, In my mind, it gets a bit more complicated because data warehouses are just managed in this different way. They don't always have the freshest data. This is a bit of a concern. If your data, if you have a real time data warehouse. Then it's obviously going to, you may want to experiment like you could do both. And the other way to look at things is really that you may want to do some preparation. And for example, index the vectors or optimize them in some way. on the data warehouse and then copy things over to an oil depot system. And I'm not sure this is something that is very easy these days, but it's something that I'm experimenting with because there is, vectors are really weird. There is parts of the flows that are a lot like data warehouses. Like doing the ingest process looks a lot like a data warehouse. Creating indexes can be very heavy and the data warehouse machines may be a better fit. On the other hand, this interactive, I'm chatting and I need to continuously pull stuff up, is very much OLTP ish. So I really think we need to find a way to use the data warehouse for what it's good for, use OLTP for what it's good for, and Get the right data structure to the right place over time. Yeah, incredibly valuable advice, Gwen. And that's, and thank you specifically for the free consulting, because I use Postgres and BigQuery, so that's directly applicable to me. Thank you. No, I'm joking. But it is a great point that, yes, a year, maybe two years ago, when we were, Early in generative AI with language models there was this craze to adopt a new stack, a new vector database whatever it was, a lot were spinning up at the time. Which they're probably it's, great technology and probably it's specialized go to markets for that motion, which has its own value for teams that don't change things super fast, but do have to Innovate when the time is right and already have large investments and things like Postgres or whatever that database is, yes, it makes the most sense to test what the, with the new features that came out to support you, whether it's a vector extension or your data warehouse provider offers vector columns and seeing what goes out there. Cause yeah when I researched researches 18 months ago, it sounded like you needed a totally new sack, but now with all the announcements literally in the last month, it looks like my data warehouse, my database and my BI tool all support chat GPT on my data type experiences, right? So it's it's such a big problem with so much attention on it that it's not like no sequel 15 years ago where the large database providers didn't really invest a lot in the competing in no sequel. So like new players like MongoDB or even Lucene based, Inverted index, like elastic search could come into the market and grow. So that, that, that's, what's, that's, what's interesting about it. And the other thing that's, especially when you're talking about there's really two categories of it. One of which is customer facing. Chat experiences, which you really want to make sure it doesn't hallucinate and rag is critical to make sure it has all that context. But that's that's one part of it. The other is internal workloads where, yeah, you could probably tolerate a little bit more hallucination, but these language models. Can really be tripped up by inconsequential information, which means that the way you actually index this text really matters, right? So if you're single view of the customer or whatever that knowledge needs to be is actually spread across let's say 200 normalized tables in different relations and you're representing customer names with like I IDs and some tables and some other values and others. And you have to do a bunch of joins. You also have to think about where is the ETL happening? Exactly. Yes, 100%. And we're already seeing a lot of ingestion pipelines and ETLs specifically for AI. And because having AI in the name and pitch deck guarantees funding these days, you see a lot of new kind of ETL or workflows for AIs that, yeah, it's I think a lot of the ETL incumbents. Maybe you're not moving fast enough, or maybe, I don't know about Striim, maybe you are moving fast enough, but is ATL ETL is an old practice, right? So obviously the next generation already being real time and all this, but, there is IBM data stage going ages, back. I don't know if they're moving fast enough to get power to be part of this new. Vectorized ETL. Absolutely. It's the whole landscape is shifting underneath data and infrastructure providers, how are you going to best support AI workloads? And so I did want to ask you what's the you gave a very Comprehensive advice and great a great answer in terms of the, state of AI for, Postgres specifically what are some of the advantages of doing, rag and vector embeddings and using the Postgres vector extension compared to other stacks? Yeah. So basically I would say the biggest thing is that you are using Postgres and I do think Postgres is. The perfect database for building applications on top of, and obviously I'm looking mostly at B2B, those multi tenant applications. But, if you think about Postgres, it's just has such strong support for such a wide variety of workloads that for someone who is Just building an application the first few years before you have time to hire a data engineering team to build your ETLs and data warehouse and all that. It can do data warehouse workloads. It can do document, like JSON document workloads like MongoDB. It may not be as amazing as it is as MongoDB. But it's one database that can do that and relational and text search. And if you don't like the built in text search, those 500 extensions with better text search, it just, it can do so much and it can do vectors. So if you like, if you're looking for a very capable database, it also does vectors, Postgres is really good for that. And then I've been watching PG vector. Extremely closely and it's so nice to see it also releases like clockwork as part of the postgres community. So we're having a release every six months and the improvements and features that go into postgres. Any iteration is just so impressive. And again, those very detailed benchmarks and it moves so fast like you think every half year is not that fast for AI. But when I talk to people using it, they literally don't even know about the cool stuff that happened back in April. And we're already looking at the next release that gets even better on top of that. So one of the cool optimizations that I really love there is the support for quantization. Quantization is one of those AI topics that people, not enough people know about, and it's such a game changer. So AI, everything about AI is those large vectors and matrices. And those are extremely large, almost by definition. The more numbers you have, the more data you can represent. But it turned out that every number in those very large vectors is floating point. And if you use, and it turns out that you don't actually need all 32 bits of a floating point to get a good result. A lot of time, 16 bits, even 8 bits, as long as those are the significant bits, They're actually good enough. So now you can get same result with half or quarter of the memory. It basically means that you don't need the latest, greatest, impossible to obtain super expensive GPUs. It can run just fine on my back M1 or something very cheap that I can read from the GPU for rent vendors out there. And this also applies to embeddings. PgVector allows you to, instead of using a full vector, it has a half vec type, which is exactly how it sounds. It's 16 bit floats instead of 32. That's it. Half the cost, double the speed, and you did absolutely nothing except in search and replace vector with half vec in your script. So it's it's pretty cool stuff that just I feel like the problem is not that people don't use pgVector. It's that they use it, but don't know how to get the most out of it, maybe the problem. Absolutely. And this is really what it comes down to is there's such a large ecosystem on top of Postgres, ultimately delivering in an application on built on top of a language model will require like that extra work, that extra optimization. And like you said, this is overall probabilistic operation, right? When you do the similarity search on, on, on vectors. And when you think about some of the most, Popular data structures for probabilistic operations. You think of hyper log, which is used for like carnality estimates. And this is something that also like exists, on Postgres, not exactly in the, AI context, but just as an example of ways to, that the Postgres community has optimized performance and it's just a recurring theme with Postgres that, you know, whatever that technical gap that needs to be filled, it's going to be addressed by the community and the leaders. And you'll, always be in good company in terms of those who adopt it and scale it and with, there'll be high costs with generative AI applications. You don't want to pay tons of extra money on like database licensing, for example, in some contexts. And then, being able to do that with Postgres gives you not only the community behind it but just such a rich set of great SaaS providers to work with, too, that also standardize on Postgres and have Postgres compatible APIs and very friendly to Postgres. With that being said, I'd love to hear about what you're working on at Nile. Yeah, thank you. Of course. Basically as you've heard, I'm really excited about Postgres and I'm really excited about the ideas that you're basically, you don't have to think about the database in terms of a database. For this type of data versus the other type of data, you are mostly thinking about, okay, I'm building this application, what do my customers need? And then you know that Postgres is going to give you all of it. If you need vectors, if you need relational, if you need JSON, if you need to join all three in some way to give what you need to your customers, you can do all of it. With Postgres. So for us, that was this, the beginning, like you can build anything you want on Postgres. It is a great basis for an application. And then we started, when we started Nile, we talked to a lot of people building software as a service, as because you've been kind enough to talk with us and help us out in our large discovery effort. And we talked about what problems people run into in different stages of development. Building not just an application, but also a business on top of the application. And we realized that very early in the process, you have to make one big decision. Are you going to use the same Postgres for all your customers? Or, and just cram them into big tables that have a tenant ID column in them? Or you're going to have a database per customer and isolate them more, be able to Maybe give each one better service because you can back and restore them individually. You can change things for them individually. And you have to make this decision so early in the game. It is so hard to change if you got it wrong. And if you think about it, it's so tied to how your business is going to work. If you're going to sell to larger companies, enterprises, Of course, you, it makes sense to have a database for each one of them. If you're going to have 50, 000 small customers within the first year, then of course it makes sense to store them all in one large set of tables. But how do you know when you just go and design your first data model? And what do you do if you get it wrong? For us, this was the missing bit in the architecture. The fact that you have, you're not just having a random database. This database has data, not for you, but for your customers. You need to Give them to figure out a model that gives them the right service, but also has the right cost and manageability for you. And then you start, when you start thinking about it, it's not your data, it's the data of your customers. How do you authenticate? How do you manage access to it? The current model where you have single user connecting to your customer database, and you only do any kind of filtering and access controls in the application, As a data person seemed absolutely terrifying, like people really the entire world. This is how you're shipping applications. I trust you with my data, and this is how you secure it. It sounded very scary. So we wanted a better system at Nile, and We were a bit naive. We were like, okay, Postgres is very hackable. You can write all those cool extensions. How hard can it possibly be to write a model where it will look like you're inserting everything into the same table? It will still, you have one connection, you do insert into and you put the tenant ID. But behind the scenes, we actually detect which tenants it's for. We route it to the right database. When you run a query, if it's for one tenant, we route it correctly. If it's for multiple tenants, we collect all the results and give you back a collective result. So we basically set on the path of re architecting Postgres, which day one sounds like, how hard could it possibly be? And then two years later, it's okay, we are still here. Starting on an interesting journey, but some of the pieces are now in place and we can see where it's going. And yeah, that was, basically what we ended up with. Very cool. I yeah. Niall doing some absolutely incredible stuff. And I love how. It's very, works great with Postgres. If you're already a sophisticated Postgres user, it takes away so much of the boiler plate away of, and not these generalized problems of how do you deal with multi tenancy and applications and, creating isolation for certain users. And I, I really love the templates that you have on your website too, that just show how fast it is to deploy this app. With the infrastructure that I already use that's very valuable, cause I don't want to spin up all new types of I don't want a new type of database and, even worse, you don't, nobody wants a new programming language, absolutely. Yeah. Yeah. It's great to just provide a fast path for users, not only to deploy, but also scale and, in a, framework that's, that, that really does allow you to standardize and grow with the infrastructure that, that you already have. I think that's, what's going to continue to be resilient because if you look at the databases people are using. 10 years ago it's, yes, there's new entrance and niche type of databases, but the big players are still very active Postgres being, of course the largest open source one. So yeah, very excited about your work. I've been following your talks for a long time, Sorry to the listeners for the history lesson going back to Strata, but hopefully some of you were there and able to appreciate it as well. But Gwen, that's great. Where can people follow along with your work? And so there's a few places. First of all, I always have and will continue to blog a lot. So these days it's on the Nile blog. And no, I post, I'm very excited about AI, if you haven't noticed. So I, and especially multi tenant AI, where you actually get to isolate the tenants So I post a lot about that and optimizations and so on. And also some Postgres basics from time to time. And then I am less active on Twitter than I used to be. But, and it's no longer called Twitter anyway. But you can still find me on the Excel. And, and then yeah, I also am available on LinkedIn. I normally accept connections and post interesting things and respond to people who ping me. Oh, and Niall has a Discord. So if you want a live chat, I am there. Because a lot of our users are there, I'm very responsive on our Discord. That's incredible. So many great ways to follow along with Gwen and stay in touch with her. I totally recommend it for everyone who wants to keep their skills nice and sharp and updated and especially with awesome open source infrastructure where Gwen's done a lot of work through both her work at Confluent and now at Nile. So Gwen, thank you so much for joining this episode. I'm really excited to follow, continue following along with you and thank you to the listeners for tuning in. Thank you so much. It's been a pleasure. I'm You