Cultivating Developer Communities and Revolutionizing Data Analysis with Viktor Gamov Artwork

What's New In Data

A podcast by Striim (pronounced 'Stream') that covers the latest trends and news in data, cloud computing, data streaming, and analytics.

All Episodes

What's New In Data

Cultivating Developer Communities and Revolutionizing Data Analysis with Viktor Gamov

February 02, 2024 • Striim

Unlock the secrets of engaging developer communities and the transformative world of real-time data analytics with our guest, Viktor Gamov of StarTree. From crafting code to leading developer relations, Viktor unravels his career evolution, highlighting how fostering connections and sharing knowledge with developers has reshaped the landscape of tech communication. His take on the democratization of technical know-how reveals the profound impact of making what was once consultancy-exclusive, accessible to all. Tune in for a masterclass on the importance of community in the tech industry and how it can break barriers for innovation.

Are you ready to see data come to life? Viktor’s thrilling exposition on Kafka, KSQL, and Apache Pino turns the arcane into the amazing, using a real-time Pac-Man game dashboard to illustrate the revolutionary shift from batch to stream processing. Witness the rebirth of open-source technologies and grasp the concept of 'data in motion' as we discuss the critical importance of streaming platforms in modern data architecture. Viktor's expertise in developer relations shines as he demonstrates the value of making complex tech relatable and relevant to business needs.

The data landscape is ever-evolving, and with the rise of AI, the stakes have never been higher. In an era where milliseconds matter, Viktor peels back the layers of how Apache Pino is driving real-world solutions for industries galore. From restaurant load management to transaction tracking, discover how real-time analytics are informing strategic business decisions. As we journey with our guest back to his roots in data and game development, we're reminded of the cyclical nature of passion and profession—where one's beginnings often foretell the trajectory of their career. Don't miss out on this episode, where we connect the dots between nostalgia and the next wave of data innovation.

What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

0:15

Hi everybody. Thank you for into today's episode of what's new in data. I'm super excited about our guests today. We have Victor Gamom. Victor, how are you doing today? Hey, John. It's great to finally get to the show. Like a long time listener, first time caller. And thank you for inviting me. Yeah, absolutely. You're a great expert in the industry. And really excited about getting your, your perspective here on the show. Tell the listeners a bit about yourself. Yes. I am head of developer advocacy at the company called StarTree. At the StarTree, we build a commercial offering on top of like quite popular open source real time analytics software called Apache Pino. And before that I was spending a few years helping to, to. Talk to developers in API world. And before that wrote the book about Apache Kafka while I was working at the Confluent and was talking about all things to real time data, all things or, you know, close to real time. Some people very sensitive about the term of real time. But let's put this like in a streaming world, in the streaming world with the streaming data. I spent some time with that before that I, I, I was, I was quite a bit around a data world fast data Hazelcast was another company that I was working and we were building a distributed distributed caching system, distributed stream processing system. Excellent. And yeah, I love that you brought up Hazelcast as well. When I was, senior in college doing NSF funded research. I was actually using Hazelcast to do processing of IOT device data. So very cool to hear that you were working there and you also have a background in software engineering. So I'm super curious, you know, what made you pursue developer relations after working as a software engineer? This is a very good question. And for me, it was kind of sort of natural the ways how I moved from one thing to another, I was doing what we call these days, the full stack development as a consultant for I guess, so around like five years or so. And one of the things that we were doing as a consultancy firm, obviously we need to advertise somehow the things, what we do. So we started building a, some of the communities around the technologies that we use. First of all, they help us to spread the word about what the things we do. Also we we're trying to build a network of people who would know what we do, but also try to bring the people and teach them of, of the, some of the you know, cool technologies because we're getting a lot of people who would want to, to learn about this. And we started doing this around Princeton Java user group and the New Jersey flex meetup at the time. Adobe Flex was a big thing for application development. Like every time you're building some enterprise application with the, you know, different graphs and different tables and stuff that people like to see in their application these days, we'll use an Adobe Flex. And while we're doing this building a communities around some technologies that we use and technologies that we like and technology that we promote second Job that I was, I was getting, I was, it was already part of my job description. So it is not like undercover thing, but Hazelcast was kind of like a company that was built on top of open source technology. And there was a lot of like following in in open source world and integration with open source tools, integration with technologies like Hibernate, Spring, and some other places that would use like Jcash or whatnot. So. For that, we had to kind of like a build a system or build kind of the process around how we're going to work with community, how we talk to people in community, and that allows me to develop not only technical skills, but also communication skills, because like spending your time reading some of the GitHub issues or, or discussing the issues with people on the Stack Overflow. I kind of like it. And my overall mentality about the job and the things, what I do is I, I don't, I don't believe in job security by withholding information. I rather, I rather share information with as many people as I can. So people know it doesn't matter what I know or who I know matters who know what I know. And thankfully I was able to kind of like officially transition into developer relations when I was at Confluent, I spent my first year there as a, in a, in a professional services, helping to integrate and install and building the systems on top of Apache Kafka and the. I had the opportunity to transition within the company. And from that kind of like now it's it's, it's still not the smooth sailing, but it's still kind of a little bit of rocky, but yeah, I think this is where, this is how, this is basically how my part of the story happened. Excellent. And developer relations is. So critical, you know, and I run our data organization here and it really seems like the best information, the most actionable information we get with our data stack is actually delivered to us through DevRel because the way you message the technology is so consultative and helpful. I think it's something that traditionally you would have to pay a consultant to get this type of information or something along those lines, but now it's so accessible thanks to the work that, you know you know, folks like yourself do through, through developer relation, developer advocacy, and really. Think about the end users of your various platforms. Yeah I think I agree in the, in being a developer advocate for, for a while, it is from one place, it's like honor and privilege because the company puts a little bit of trust on top of, you know, for me to represent the company to outside world, meaning that, you know, I know a few things about not only the product, but also about the people and how we operate with with community and things like that. But also it is very good validation. So there's, there's saying that like when someone tried to be like everyone wants to be mentored, but like if they are ready to be a mentor, same thing with DevRel, when the company establishes DevRel, it is important to understand that it's a two way street because it's not only kind of like a go and preach and talk about our product, but also be ready that this group of individuals who, who, who consists this developer relations and developer advocacy groups, they will bring in some feedbacks. They will be they will be the voice of the community inside the company. And sometimes it would be, you know, it's very, very, it could be very hard pillow to swallow. And many companies are you know, might not be ready for this type of thing. Some company actually built because they build the tools, they build the technology for, for developers or for that matter, like I use developers as a term at large. I mean that people who build in the systems, it can be system integrators, it can be administrators, people who, you know, touching the things on the you know, understand the value of the particular tools. So those developers, they're not necessarily they're not necessarily people who can sign the deal, but also they can be definitely people who can make, make all this disappear. They can, they can definitely kill the deal. These days. Developers and builders, people who have skills to use the software and pick the software they can choose the software what they like and use the software what they like. Otherwise they might say, so it's not the end of days where people were using software that were acquired through, you know, some successful gold field gold, gold field. Game and some, some people would close the, you know, the million dollar contract and send the, I don't know, like some, some web spheres and the web logics of the world to, to developers to suffer. But these days it's slightly, slightly different. So that's why relations, good relations, establishing good trust with community. It is important for the companies to, who, you know, building tools, building the systems that they targeted tour was like a builders and system developers. Yeah. And I think that's so interesting and it's a very it seems like a kind of a secret hack for a lot of companies that really did well was really resonating with the community. And like you say, the end users might not control the budget and say, Hey, we're going to go spend a million dollars on this tool. But if the conversation is coming up in a company that, Hey, we're about to spend a million dollars on a tool, obviously you need mass adoption throughout all the various end users that would be building the systems, like, like you said, and, you know, if they don't have if they don't resonate with the product, they're not able to use it if they feel like it's sort of a black box where they don't feel empowered to be productive at their jobs. And they're not going to accept the tool, which will ultimately block it's Yeah. Growth within a company. And I've seen that happen with a lot of tools that are sort of, you know, secretive about how their, their, their product works. So what are some of the ways that you've connected with the community on the various products that you've worked on? Yeah, so this is. This is a fundamental question for many developer relations specialists. It's not necessarily only for developer advocates, but also for developer relations engineers people who are building who helping with community is to understand what value so sorry for using this type of words, but what value actually we're bringing, like what will change if Developers would use this tool or if they, what will change for them, if they will stop more importantly, what it will change for them if they stop using this tool. And this is the answers that we at Developer Relations and Developer Advocacy trying to trying to answer. So first thing, let's, let's talk about like more inspirational, what they will give for this. In my personal. My personal way of thinking about technology is the way how I tell about the people about technology. It's not about features it's about. What is possible to do with those technologies? I can back in my days when I was doing the, the, the Kafka advocacy and talking about stuff, like people love to talk about, you know, distributed architecture, distributed consensus, all sorts of things that might happen in distributed system, but in general, it's a good conversation to have, like among, like like fellow geeks and like the distributed system enthusiasts, but in general. The, the most interesting part would be what is possible to do with Kafka and what was not previously possible to do with existing technology. Like how difficult to do say like a real time scoreboard on some, some real time online game. How quickly you can see the results and how those results across the multiple players will be accumulated. How those things, would it be possible to do that Kafka? Yes, of course. There was the technologies that were available. Would it be natural to do this with those technology? Maybe not. Maybe it would be a little bit cumbersome to, to use something like, like traditional databases to, to store this information versus, because, you know. Fundamentally, Kafka also has a storage, but the way how the Kafka data can be consumed and the way how the data is represented at the very beginning open up some of the many different different aspects. And the second thing is that what I said is like, what's going to happen if they will take this away? Like if the if this product and the system that we will be talking, what's going to happen to developers if this thing will disappear? Like how long it will take them to replace it? Or maybe you know, using some, some other tools, technologies, it's, it's another interesting question that DevRel helps to, to answer, but my personal my personal philosophy around this is just like show what is possible, not what the features, not, not. Not features. Developers should, they, they, the, the inspirational part of DevRel is that when you're showing something, they can figure out how this can help them. Again, we, we were using like very popular, very cool demo. That was using like KSQL and Kafka to do, KSQL was like a. like a streaming stream processing database that has a SQL syntax and also can do stream processing things. And we use the front end from the Pac Man and the, the front end communicated to to backend through Kafka, some of the, like a movements of the, of the Pac Man was sent to Kafka coordinates or things like that enough that the results of the real time dashboard about like how many how many points one one player collected in, you know, you can show this in, in the real time. Would it possible to do without, of course. 100%. But the way how we structured the demo, we clearly show like use case, like a real time use case, and we showed how this can be solved elegantly with just a few pieces of technologies and the way how the data would look like. So the many developers can relate to to to the problem, I would say. Yeah, it's, it's, it's very powerful to, you know, think of the art of the possible and inspire people by showing them, you know, if you actually invest your, your time and, you know, your, your money via your time on in your product, you know, what's the best that it can do and how is it going to change the game? You know, at your company on your team, you know, bring value to your business. And that's, that's something that developer relations is just uniquely positioned to do. And, you know, for instance, you brought up streaming SQL streaming SQL is a very powerful concepts you know, stream the, the, the, the product I lead, you know, basically combines change data capture and streaming SQL to build, you know, real time data products in in the business. And, but, you know, when you just look at. Just the low level technology, it's hard to paint that picture. It's like, okay, well, change data capture. This is a, you know mining database logs. It's complex technology. I don't know what I get out of that. And streaming SQL, it's like, it's SQL, but it's, you know, how is it different? You know, versus just running queries on my database and people can't really, you know, think of the big picture until you show them what's possible. And. You know, developer relations, you know, when you think about the demos and be able to show like a real time dashboard or real time data product needing a back end or a customer facing operation, that's when people really see the value and think, okay, let's take this low level technology and apply it now. You know, you have a lot of great experience in the data industry across various companies. I wanted to get your take, especially since you work so much in the community, I wanted to get your take on the current states of modern data architectures. Wow. This is a this is 100 percent not not set up question because we can talk about like any technology because these days we see, I would say we see a renaissance of this type of technology and especially with the open source technologies, the, the way how we transition from the batch processing a couple of years back with Hadoop and after that Spark is was, it still is a quite powerful tool that allows to have a bridge between like the batching and streaming. But it's still, there's a, a lot of things going on with Spark. And after that, Apache Kafka, we've all the technologies, Kafka Streams, ksql, Apache Flink all these things kind of like become household names for many organizations who use who use technologies. So, first of all, I don't know if you will agree, you, you. That's if you're looking to open source technology, and I know that Striim has the similar capability before Kafka was popularized in terms of like taking the change data capture to send them, say, for example, through GMS, it was a thing that I think. We did back in the day when I was at Hazelcast, we actually built some of the integrations on top of Striim. And we use a change data capture to do thing, what we call hot cache, meaning it was some data that was kind of like the required by application. And this data will be represent into, let's call it like denormalized form form that would be ready to consume for application. So there's no, there's not much kind of like a transformation is happening. So the things will be handled by application logic potentially because Hazelcast has this thing called data, data loaders, or I don't remember exactly name how they call it these days, but essentially what allows to populate map. On, on the distributed map based on some sort of query. So the data would be available in the format just to the key value. Just need to get the data immediately with the tools like streaming SQL. Okay. Before streaming SQL, like the, the Kafka streams did a lot of work in terms of putting the transformation in hands of people like real time transformations in hands of developers. And the Kafka streams. Still will require like coding knowledge. I think the, the importance of the streaming SQL and the ability to put the data like data in motion, data that, that moves, not the data that sits there, we already agree that we already solved this problem with the traditional databases, relational databases, or, or, or document the rated database or whatnot. But if we're trying to build a real time thing, so we need to stick to real time technologies. Yeah. With with streaming SQL, it allows opportunity for people who know SQL start looking into real time world and look into the, how the problems that they were solving with SQL can be solved using these real time technologies, the things that SQL Flink SQL does or KSQL. There's some other technologies that, you know, emerging for the last couple of years those allows to wider range of users to start applying their, their formal skills into new problems. 100% Not possible to solve without certain things. So every time I'm going to say, Oh, it's actually streaming SQL that will slightly change, for example, there's like a statement select from stream or something like that, that will or how you defined the windowing functions, because you know, the, if you need to do some sort of aggregation, you, you, and the, the stream is unbounded this aggregation would be kind of like a very interesting to see how, how to implement this, but you, you usually have aggregation based on the window and like how this would look like from, from the SQL is also interesting. So modern data architecture, I would say like majority of the people these days, if they trying to push the data that this would be. You know, closer to real time the 100 percent will be using some sort of streaming platform. It can be Kafka, can be some, some other implementations based on the similar technologies. But the idea is that data will needs to have. The format that it would be available through real-time consumption. Second thing there would be engine that needs to do something with this data. So Kafka provides a storage for historical data, distributed transactional log of all the operations. But sometimes it's, again, cumbersome to query this data from the Kafka because Kafka doesn't have. you know, query engine built in. Query engine is when we turn in this database inside out, this query engine becomes like any application that runs. It can be Kafka Streams, it can be ksql, it can be something else. And the things that we do here with the Apache Pino, which is also sort of database, but it's not like your traditional database. It's not like it's not your relational database. It's the database for analytical queries. Those queries also can support. some sort of form of SQL that allows to query this data. So data will land in the format that it would be queryable. Data can be updated in the real time based on new data that comes from Kafka. Very similar approach can be done without bringing the bigger systems, without bringing Pino, for example, with Kafka Streams and ksql. Those those systems, those applications ksql, for example, has its own way how to store the data with Kafka Streams, you're more you're more on you know, your application code will become your, your query, and you need to figure out how you're going to be storing this data, and this can be solved also multiple ways if you need to, like a query data that's not in the Kafka anymore. So Kafka for storage, for delivery the Kafka streams Flink for doing some sort of like a massaging of the data, transformation of the data some of the complex things to bring data into Kafka again, it's also more or less solved the problem with Kafka connect. There's tons of different connectors that just simply, you know, you point your database legacy. We call it, we called it that time legacy database, but it can be a microservice, it can be API, it can be database. Can be messaging system to get the data, put data inside the Kafka. And after that, using the other tools to process data and store it in somewhere else, for example, for further questioning, for further queries or store it back to Kafka and, you know, use the Kafka as your, you know, literally pipe for sending data back and forth between different components of a pipeline system. Yeah. Yeah. And there's so many great discussions going on. With streaming versus batch, and it seems like a lot of folks already have their let's call it you know, opinion based on their experience where, you know, they've been doing batch their whole career. They kind of don't see the value of streaming. And if you've been doing streaming your whole career. You just kind of think of batch is still limiting is generally what I see. And, you know, obviously the the, the truth lies somewhere in the middle. So I'd love to get your perspective on streaming data technologies versus batch processing. This is a very, very good question, John, because as you point out, It really depends who you're asking. And there was an interesting panel couple months ago at the conference where someone who were like deeply involved with the, with the Spark we're talking about, Oh, you know what our users are. Don't care about real time. You know, the few minutes of delay kind of like on the response, it's okay for them. And in some cases, it's totally fine, you know, if you are expecting the results that will be executed, you're running end of day report. And you know that you will not be able to run this report until you have, you close your operational day. You have all the numbers from your trades or from, you know, banking day or whatever, day is closed. And after that, you can have whole night. In order to create this report for people who needs this information next day to do, you know, whatever, to do planning and stuff like that. And it's totally fine. And it's totally fine. And it's totally 100 percent legit use case. And if it works for many people, it is great. And we've the real time real time data, real time analytics. We care a little bit. More about what are we going to use this data for and how this, like, you know, the latency between how this data will land and some of the insights that we can get, how this will help our business to, you know, to operate. Do we really need to you know, this few seconds queries that we'll be running our, you know, the analytical database versus, yeah. Okay. We're going to run this for one hour. We're going to run this report. For like a few hours, totally fine. And when you talk to, to, to streaming people, of course, everything will be looking like a, yeah, we can, we can tackle any task, but the cool thing is with streaming is actually can solve with the multiple tools that are available. They can solve both streaming and batching problems, but with batching tools, solving streaming problems would be not only cumbersome, but in some cases also impossible. Yeah. It's, it's, and it's one of those things where, you know, usually real time is a requirement that comes from the business. And, you know, especially I've, I've seen this firsthand where you know, with Striim, we work with so many customers, like, you know, for instance we did a case study at Data and AI Summit with American Airlines and, you know, they essentially have, you know, SLAs in seconds, because if the aircrafts stay on the ground, they're not, they're not serviced immediately, then that leads to delays, which leads to unhappy customers, which leads to revenue loss and things along those lines. So. Okay. And the way I see this across many industries is, you know, if you can associate your data products to revenue, then usually latency becomes pretty critical. You don't want seconds of delays to get into your, your, your path to revenue. And now for, for long term storage and reporting, you know, that stuff's not going to go away, obviously. And that doesn't always require real time. Exactly. Exactly. Yeah. So, you know, really depends on the, on the business use case. You know, I don't, I don't love it when anyone says like, Oh, just go with, you know, one and the other doesn't work or the other one's, you know, bad. You know, there's always nuance to these discussions. Right. I do see a lot of situations where teams will default to batch. In scenarios where they really should be doing streaming and it's not intuitive because it's not just about real time, but it's also about data consistency and correctness. For instance, you know, if you're doing change data capture from a database and you're doing that in batches that's actually setting you up to have duplicates, incorrect data, and your downstream reports and reconciling that is actually a very tedious from a people perspective and also expensive operation if you're going to do like resyncs and moving lots of data to reconcile it. So, you know, that's that's 1 area where I see people kind of tripping over themselves when they kind of jump in the batch versus thinking about streaming. Do you have any other scenarios where. You know, you should be using streaming in real time, even though it's not like intuitively obvious to go with that in the first place. I 100 percent with agree with your sentiment around you know, use case with this the change that the capture, because ultimately if you think about this, this underlying transaction log is also stream of events they are immutable the, the, the things that happens with your, you can do any operation on your database, mutation type of thing, or like you do insert, or you do delete. Those things are, needs to be ordered in history, it needs to be immutable, so you will be able to kind of like rematerialize in case of failure of this database. I think what's the problem the, the, the question not about like, should they choose one or another? It's also kind of, I would try to echo to previous previous point about education. Explaining the, that streaming also leads to slightly different way how you can look at your data. One of the problems that one of the challenges, I don't want to use the word problem. One of the challenges during my times where we're talking about developers about how the event driven microservices can be, can be built, and how Kafka can help in order to establish this communication between the systems, is to help people to properly understand what does it mean, stream, versus table in the sense of stream processing and how one will turn to another and things like that. So the, once you, they will start understanding like a stream table duality and you know, now it sounds like too geeky, right? It's like a particle and the waves duality of the light. It's a very similar thing, right? So once the people will understand that. State is actually not something that you need to worry about. How do people usually worry about state when they talk into traditional database, because essentially what you're storing in traditional database, relational database or whatnot, you're storing the state. If you have a list of customers, you have a current snapshot of customer information, blah, blah, blah. So when you do select star from customers, you will get the query estate. And when the. The state does not exist in the world of stream kind of like by default, like, because everything is kind of like the sequence of events that happen over the time. It's difficult for many people to like at the first place or the first time to comprehend how this would, how you can inherit the state from the history. And one of the examples that I like to use in the past is is basically your ledger. It's your history and your current balance. It's your state. Like, sometimes you care about only knowing about your your current balance, so you're quitting the state. You can always, you can always materialize your State based on the history, but it would be very difficult if you don't have a proper technique, especially like a change that you capture if you don't have this proper technique to, to, to capture changes of the state, say if we're doing doing insert into first name, last name values like the, the vector gamma into database, we need to have a facility that allows us to not only capture what was the existing value, right, for the state, but also previous value in order for us to to do something with this data. It may be not that obvious for customer data, but maybe it is obvious for for order data because there will be multiple statuses that order can be. It can be. You know, order placed, ordered fulfilled, order shipped or delivered. And without having this like history of the order, it would be difficult to track. This is the use case where it's inherently streaming use case because you, you, you, you, you capturing state of the order. Can you do this with relational databases? Of course. But it would be super cumbersome. You will be not doing third normal form. You will be, have a denormalized version of your, of your table because there would be some repetitions and things like that. And by the end of the day, ta da, like your users might be interested in the current state. of your order, not the previous history, but your quality department that would be responsible to minimize time delays between how the order would be received and would be fulfilled, they're interested in the history. So they would use, they would be interested in the same data, but they're interested in a different aspect of this data. So I think, again, maybe it's a professional deformation. I see everything as extremes right now. So once I changed my mind, I never looked back. Even if it's an inherently batched batch system, stream changes the ways that this data can be reused in a, in a, in a form that you didn't expect. And, you know, the way this, and even though this is a lot of you know, great technical information, I mean, the way this comes to fruition in enterprise, the kind of problems that we're pointing out is, you know, classic scenario, business user pulls a report and says, these numbers look wrong. Why is it wrong? Because you're, yeah, you're, you're kind of passing, you know, current snapshots around to different storage layers without looking at the history and the log of changes over time. And, you know, this creates the type of very classic inconsistency that that can happen with stale batch processing systems where, yeah, I did a query against, you know, 1 table tried to load the results to another did a merge and, you know, in in that time frame, you know, either, you know, there was an update or, you know, primary key change any 1 of those things right that you're not reflecting. That's. That that change, right? So it, it, it definitely impacts impacts the business and that's the side of thing, the side of data that, you know, a lot of folks who are very kind of stuck on batch you know, either it sounds too complicated or they don't want to think about it because then they kind of brings in the question. You know, what, you know, what is the quality and accuracy of all their reports if you know, they're, if you're building them like that. I mean like to this, to this use case, I think when people also can use this as a counter argument for streaming, you might say, okay, so how the streaming can, can, can solve this problem, right? Because, you know, there can be some events that can arrive late. And the modern tooling is actually was a big concern during the time some of the modern tooling was designed and developed. The things with the fling, the Kafka streams with the, some other technologies to how to, how to account to late events, what we will call it, right? So late events from perspective of business, but maybe not. You know, like and the modern technologies, they know how to account for this. There's like different strategies, how this can be handled. And, you know it can be some sort of the, the, the, the window sizing, the fallback logic, how, you know, how long this window should stay open just in case if we have some sorts of late events. So that's, that's my argument for, you know, the data, data consistency, data clarity, when we were. Talking about like a Lambda architecture versus the cop, the cop architecture where kind of like a stream was mixed with batching and streaming processing was used more like a system that would give immediate result. But this result might be not necessarily would be accurate or 100 percent correct because we don't have a full data. So I'm not saying this thing seems like it cannot happen. This things can happen, but there are tools that already Throw it out about, you know, how to solve this problem. So it takes a little bit of time to, to to adjust to the way of thinking about streams and things about how the data flying in. Yeah. And, you know, in our product Striim, you know, we came across especially late arriving events and, you know, we solved it with something called a grace period in the stream that a user can configure on a specific time stamp. And, you know, and that's how a lot of our customers are able to do exactly once processing. So, like you said, and Flink has has a similar implementation. And, you know, there are definitely there's a lot of tooling and even, you know, fully managed services that make this just as simple as batch. Now tell me what do you see is the future state of modern data architectures? And, you know, I'd love to hear your perspective there. Very good point very good point about how this data architecture will change is, I am sorry for bringing this up, but I think that CI is a huge driver right now. The similar question I had. Like back in my like API days, people were asking, okay, so what's the, what's the API strategy for API gateways? Like what's the AI strategy for API gateways? And there's many aspects here. The first wall with the, the way how the, the, the models needs to kind of receive the data, real time data, get, getting real time data to the models. It is still important thing in the delivery of those. Changes like for updates to, to events, it is very, very important for the AI technology. The, there will be a lot of ways how the system would be, you know, use, or needs to be created for, for AI models, you know, special type of vector type of queries that would be inevitably will be available as a part of of the tools, because that's what the, the, the market is asking. What's the communities asking? Like how apart from just like giving like a structurized queries on top of data put, you know, indexes, like indexes is the bread and butter for, you know, Databases for, for analytical database. So you need to somehow hint what kind of data is more important for you and have an ability to not only have indexes for the data, but also have this like ability to speed up distance between the words in order to establish kind of like a correlation or like a connection to the words. I think that's the, it's going to be a big thing that happened with many, many tools. For the last two years, at least I see like all sorts of different implementations of the vector databases that essentially drive all this AI revolution. So we'll see. I think I'm excited right now and I'm excited to get back into data world. So you, so you're back in the data world. Tell me about StarTree. Yeah. So, yeah I, I'm excited to, to, to get back into like data infrastructure technology. So it's StarTree. We we contribute to Apache Pino and we also build some managed service on top of Apache Pino that allows to to build kind of like a real time analytics systems. And, again, it is part of the tool set. It is it is another, I don't know, your, your, your hammer, or it's another kind of like your your, your drill, or some, some cool tool that you can put in, in in your tool belt when you're building systems that will be required. Like low latency queries on top of the, you know, analytical data, you want to get information for what is current load on the restaurants. So we can send some, some coupons if we see some restaurants on their, on their platform. And we're interested in this data, not tomorrow. We're interested today because dinner time is coming. We have a, you know, we can, we can wait for minutes. Ours probably would be too late. So in terms of kind of like you're looking, you're scrolling. And you can see this like a timer saying, Hey, like the, there's a deal in your area. If you will book this right now, this is how you think that they would know this. They know this because they have this information from the restaurant. They see how restaurant is loaded. They also have this information from the drivers that they know how many drivers there. So they can also pick up your orders in some other orders. And they can kind of like since they hitting the same location, they can give you kind of like the discount. So you can also put your order together with someone else's order. So your driver will optimize the route, how it will take you. So this is also done by, you know, the systems like a real time analytic database, like Apache Pino. As a matter of fact, Uber is known, Uber Eats is known user of Apache Pino. And a few people who, you know, build this infrastructure at Pino at, at Uber, they, they joined the Star Tree to, to, you know, continue to build the solution on top of it. So, yeah, I I'm excited. I'm excited. The, I never knew that like my professional career would be like so much tied with data, and data systems and, I always thinking that I will be developing like games, but I'm excited that that the data is like people like saying is new oil, but it's kind of like sounds about right. They, they, you know, mining the data, they getting like different and new insights of the data. That's exciting part of my job. That's, that's interesting thing that I can, you know, learn from use cases from, from, from one. You know, group of people and talk about this use cases and help of a different use cases to, to other people, you know, combining data, dev rel you know, the job is makes the dream job. Absolutely. And yeah, Apache, you know, very interesting technology. And, you know, I remember just over, you know, Thanksgiving break, I was looking at Stripe, they're a payment processing company. They had a Black Friday transaction tracker, and my understanding was that was also built on Apache Pino. So it seems like, and it was like classic, like real time kind of data in motion use case where you could see transactions updating in real time. So I'm sure change data capture was somewhere in that pipeline and, you know, CDCing it into us, into start, into Pino. So it seems like this is kind of changing the game in terms of what's possible with real time analytics. This technology allows to make a not strategic maybe decisions, even though they sometimes can be strategic because, you know, you don't have to wait this report and tomorrow. You will get the, some of the some of the insights in, in, in the minutes. So you will have, you can make decision like a more tactical decisions. You know, if you think about this, if you will miss your sales and restaurant, maybe it will not ruin your business for today, but, you know, in the, on the long run, if you, you know, set of tactical decisions will fail, you can fail your strategy. So this is where real time analytics or user faced real time analytic that, you know, can be exposed to restaurant owners. They can see that what's the load, what's the, what's the current situation with the orders in the area, and they may even make some decisions. Like I said, issue some coupons in order to boost their sales. So the engine that runs inside, inside the Uber can, can also, you know, make some decisions in order to show notifications to people who interested in using this type of coupon. So yeah, that's the that's the, that's the idea. Very cool. Victor, where can people follow along with your work? I think the best place to start would be dev. startree. ai. That's the place that landing page. If you want to learn everything about me Apache Pino, real time analytics, Star Tree All the links to YouTube channel that we do for like educational content real time analytics podcast also is there. So it is dev. startree. ai. Very cool. Vic, thank you so much for joining today's episode of What's New in Data. Thank you to everyone who tuned in. Vic, we'll definitely be in touch and excited to see your work there. Thanks so much, John. Thanks for having me. And as always, my name is Victor Gamom and have a nice day.