What's New In Data

Navigating the Future of Data Quality with Telmai's Pioneer Mona Rakibe

April 05, 2024 Striim
Navigating the Future of Data Quality with Telmai's Pioneer Mona Rakibe
What's New In Data
More Info
What's New In Data
Navigating the Future of Data Quality with Telmai's Pioneer Mona Rakibe
Apr 05, 2024
Striim

Unlock the secrets to maintaining impeccable data quality as we chat with Mona Rakibe, the trailblazing CEO and co-founder of Telmai. Mona takes us on her extraordinary journey from the trenches of engineering to the helm of a revolutionary data observability company, revealing how her partnership with Max Lukichev, a maestro in distributed computing, has crafted a platform at the forefront of technological innovation. Together, they're automating the heavy lifting of data management, integrating machine learning to scale data quality, and providing an inside look at the challenges businesses encounter in securing data reliability.

This episode is a treasure trove of expert knowledge that reshapes how we view machine learning's role in data quality and the necessity of human intuition in the process. Discover how Telmai's intuitive feedback systems and architectural decisions empower teams to ensure data integrity, and how decentralized data quality management infuses agility into their operations. Mona and Max's brainchild is not just another platform; it's a beacon guiding enterprises through the complexities of modern and legacy systems. By focusing on data quality from the source and adapting to unique business needs, Telmai is setting the gold standard for data reliability, making this conversation a must-listen for anyone who values the integrity of their data ecosystem.

What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

Show Notes Transcript Chapter Markers

Unlock the secrets to maintaining impeccable data quality as we chat with Mona Rakibe, the trailblazing CEO and co-founder of Telmai. Mona takes us on her extraordinary journey from the trenches of engineering to the helm of a revolutionary data observability company, revealing how her partnership with Max Lukichev, a maestro in distributed computing, has crafted a platform at the forefront of technological innovation. Together, they're automating the heavy lifting of data management, integrating machine learning to scale data quality, and providing an inside look at the challenges businesses encounter in securing data reliability.

This episode is a treasure trove of expert knowledge that reshapes how we view machine learning's role in data quality and the necessity of human intuition in the process. Discover how Telmai's intuitive feedback systems and architectural decisions empower teams to ensure data integrity, and how decentralized data quality management infuses agility into their operations. Mona and Max's brainchild is not just another platform; it's a beacon guiding enterprises through the complexities of modern and legacy systems. By focusing on data quality from the source and adapting to unique business needs, Telmai is setting the gold standard for data reliability, making this conversation a must-listen for anyone who values the integrity of their data ecosystem.

What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

Okay. Bye. Bye. Hello, everyone. Thank you for tuning in to today's episode of What's New in Data. We have a great talk track coming up around data observability and AI. I wanted to introduce Mona Rakibe, CEO and co founder of Telmai. Mona, how are you doing today? I'm great, John. How are you today? Excellent. Excellent. Really excited for our, for our discussion. First tell the listeners a bit about yourself. So, as John mentioned, I'm Mona Rakibe, co founder, CEO of Telmai. Telmai is a data observability company that me and my co founder, Max Lukichev, started roughly around three years back. Both of us come from a good background in data ecosystems, specifically data platforms, and we really wanted to solve the problem of data reliability with a foundation of machine learning, statistical analysis, and, in the technology that's more relevant to the ecosystem that is today. So that is, a little bit about Telmai like three year old seed stage startup in the data observability space. And we are aggressively marching towards the direction of building a strong, reliable data foundation. Excellent. And I would love to hear the story behind starting the company. Telmai, so how much time do you have, John? Is it a short answer or long answer? as, as long as you want. Yeah. Yeah, we're, we're here for the long haul. Okay. Fantastic. I, I, I, I would love to do that. So, so if you look at, like, my background was predominantly an engineer turned product manager turned founder, CEO of max, who's my co-founder, has. I've always been on the technology side, distributed computing, data, query optimizations. He's a PhD in data, right? So, I've worked at large companies like Oracle, EMC, and Enterprise Space, but my last stint was at Reltio. which is a master data management company. So for the folks who don't know, Reltio is a cloud based master data management company, which is all about how do we do duplicate data and get 360 degree view into your data. Now, the interesting part of Reltio was I joined them as a Founding product manager, very early stage and Reltio was ingesting data, managing data for top farmers of the world, large enterprises. So imagine like we are bringing data into multi cloud multi tenant systems in a safe, secured manner for enterprise data. We had multiple systems in our pipeline, first party, third party, second party data processing that data, deduplicating we're doing a lot of fun stuff in the data. And imagine the amount of problems that can happen in a pipeline like this and how often our head would be on fire because, you know, some things would go wrong. There would be consistencies, consistency issues or customers would say that, look, the data is not looking right and so on and so forth. Max was brought into Reltio. I was the founding product manager. Max was working at Reltio as, to kind of shift their thinking into big data, machine learning and so on. And he was doing like machine learning based matching and all of that. And we, that is where we really saw at scale when you bring so much of data and this data is constantly increasing, passing through different systems, what are the type of issues that can happen and the downstream impact that this can happen, right? At the same time. Just like everybody else, we, we ourselves felt like this problem could be solved easily through throwing a bunch of validation rules, SQL queries, doing consistency checks between systems and all. We tried solving that, we built a data quality product just on top of Reltio, but we realized that This doesn't solve the scale issue. This doesn't solve the heterogeneous nature of data systems in a pipeline. This doesn't solve the issues around the formats of the data that we can support. But obviously, at that time, we are Reltio had a bigger problem to solve, which is the master data management. But Matt's left Reltio and joined signal effects, which is an infrastructure observability company as a head of engineering there. And that's where he saw how the infrastructure companies were so ahead in taking the data points from like telemetry data, log data. Aggregating those data points, predicting the health of systems and how, and that's where he was like, we can do similar and even better way to address the data quality problems that we have seen, and again, Reltio because we are talking about 10 years back, and there was data from different system. Most companies are moving in that direction now because now they are getting the problem of scale. Everybody's hungry for more and more data. This is only going to get bigger and bigger, right? So that's when Max, when he went to signal effects, he felt like the problem that we are seeing is going to only increase with scale of data, with adoption of AI and all of that. And we need a solution that is automated, relies heavily on machine learning, data science, statistics. And takes the power of compute that we have today and solves for like data quality problems for this type, at this scale, at this system. And that's when he picked up the phone and he's like, Hey remember we got, we had this data quality problems. I have a amazing idea of how we can solve it and we can solution this out. And then one thing led to another. And before we realized. We had Telmai Incorporated. We had, we were prototyping, we were building it out. Then we raised, we got into Y Combinator that year. So got our first round of funding through Y Combinator. Raised our seed round of fund, pre seed round of funding. Did another seed round of funding this year. Had, have great investors and We have customers and partnerships built out. So it's been a really good ride for us because a couple of things, both our problem statement was very relevant today and it's only going to increase. But the second thing is our solution approach resonated a lot. with the industry. So amazing story. And thank you for sharing that. And it's great to see that you're having those very early success signals and having Y Combinator and other great, great investors on board through the process. You know, from the perspective of a data practitioner, let's say someone is building the pipelines that are ultimately serving the reports and use cases in the business. How would I, how would I work with your product and what problem is it solving? Yeah. So the first thing is right. When we built our data, data quality should be a foundational piece. It is almost like a hygiene, right? It should be best practice that when you are starting, like think about now, a lot of companies are in thinking about AI explainability is one of the things some of the companies are much more mature and saying that that is going to be our foundational piece that every model that gets built will be used. Similarly, when you're doing any data initiatives, right? You should first think about what are the guardrails that will ensure that the data that is flowing through your pipeline is reliable, right? What automation, what checks do I have in place that I'll be able to explain? What is the basis of the decisions that my analytical teams are meaning, the models that my data has been trained on, and there could be other use cases, compliance is good. So regulated industries are a little bit more ahead in this because, because of compliance, they're forced to kind of track where the data is coming from, how clean the data is How they should be thinking is like, first thing is how much Of this is easily automatable, right? Like the challenge we have John today is that most people are technical people and they feel this is an easily solvable problem through like throwing some validation rules or doing some DQ checks. But at that point, we don't think about composability. That how will this work if I move from Redshift to Snowflake or like if I'm connecting my CRM system, will the same validation rules work for those system or like tomorrow I'm moving to streaming platform, right? So we, teams typically start developing something. I would ask them to not because I'm a vendor of a team that has built a product, but look. Of that build versus buy and scale and the future proofing composability and all of that. So first things first, adding data quality and data trust in the foundation of your data pipeline, in the foundation of your data initiatives, make it transparent, add those hooks keep into account. And second thing is like, definitely do thorough analysis of what you are building. Or onboarding scales with your need of the company and use cases. So those are a couple things I would start with when you start thinking about it. Absolutely. And, you know, future proofing is, is definitely one of those very important things to think about for, for data teams, because yes, you may have the, the tactical problem on hand that can be solved with scriptware and the things along those lines, things that you mentioned as well, such as, you know, yeah, you can have data quality checks and. You know in, in some Python script or some jobs that you run, but it's ultimately not scalable and it's not composable. So what are some of the other challenges you see around, you know, building trust and data within the business? So there's also the distributed nature of data. Right. Data quality itself is a decentralized function, right? So there are a lot of times that I talked to a central data team, platform team who is responsible for defining the data quality. Frameworks, the checks, the foundation, but the business teams who are consuming this data have a very different definition of what data quality means. So, and specifically, and when I mean data quality, it's not look freshness of data schema, these are easier metrics. They're pretty. Consistent across decentralized model as well. But when you go beyond to the data accuracy and if you are trying to decentralize, have different data products, everybody has a different definition of what those things mean. So how do you bring those business teams and have an aligned definition Or at the very least, have an ability to build data products where each product has its own definition of data quality, SLAs and so on and so forth. So that's another challenge that I see that the distributed nature of data quality causes and the processes that it needs because of that causes a lot of friction in terms of finger pointing of the source teams and bad data. Business team expects something else and they blame it on the platform team did not do the right checks. But the platform team usually doesn't even understand what the context of business team is. So, so some of it's, I always feel that it's on tools like Telmai. Are the vendors who are building tools to bring out to eliminate these type of friction by building transparency, there will always be these things. But if you automate a lot of stuff, if you build transparency around that, the friction between organization can be reduced. So that was the other thing that I consistently see across a bit. The publisher, the consumers of the data have different definitions and so on. Yeah. And, you know, I, I really think about this as a, as a journey. You know, there's constantly this evolving continuous cycle of feedback from the business back to the data team, which goes to the upstream engineering teams that are, you know, producing the data one way or another. If it's coming from your product, your production microservices that are getting your user data or your marketing sales systems, et cetera. So Right. So it's, I, the, the way I've described before is, you know, data teams almost being, you know, the glue between the, the, the business and engineering and in a lot of ways in the production systems which includes ops teams. So it's really great to see, you know innovation coming to, to help glue that all together, such as, you know, what you're building with Telmai so. You know, what are some practical steps and guidance you'd have for data practitioners to ensure that type of trust on a, on more of a long term view for their business users? So definitely for data teams, building a tooling that can enable non super non technical or sometimes business teams are technical also, but not like always SQL savvy, but building tools, which, and reports, which the business teams can understand. And again I can speak on behalf of Telmai, we have user interfaces, which are definitely designed to towards more decentralized approach. So how can a business team come and give we, we always need human in the loop. Right. So if the data feels right, wrong, we can predict as much as we want, right? That's the thing about machine learning. You can predict as much as you want, but you some, you definitely need a human in the loop, but that human should find it so intuitive to just give feedback to the system that this looks. Like Google sends images like, is it the same person? Pretty much like that. Take feedback of the business teams ability to give business team a way to look at what are the rules you are defining? What is the impact of this rule? How much data gets impacted? But the other most important thing is that. And I'll, I'll say this out and loud for data quality. It's not about just incidents, right? Because data quality needs to be automated. I do not feel that any data engineer wants to sit there doing incident management. No. Because they want to build good products, they want to kind of orchestrate the pipeline, do the fun and interesting part. We have to think of data quality as a way of orchestrating your pipeline and that's how data practitioners should think. So if you take an example of the Medellin architecture, right, like you have your bronze, silver, gold layer. The whole premise of that is that as the data flows through it, the data gets more and more usable to the final stage. It's like absolutely endorsed, clean trustworthy, right? So as data practitioners, what can you do? Apply. Like, Telmai or your own, if you really want to build it out, build it out. But don't just build it out to get notified on Slack and emails and email fatigue is real, right? I don't look at, if a tool is sending me 10 emails in a day, that tool is noisy for me, right? So when you. Implement data quality, orchestrate, and automate. So when data moves from stage one to stage two, implement design patterns like circuit breaker, that if data doesn't qualify certain SLA, stop that load. Reload it do the most obvious steps again, use the orchestration tool to redo that step. The second pattern that we love at Telmai is data quality binning. In stream, look at the data. If you definitely know that your system, next system, cannot tolerate duplicates on the data and for whatever reason there are duplicates, stop the suspicious data. Park it in a separate S3 GCS bucket, let the good data flow in, take your time to review that so that your performances, SLAs of pipeline don't get impacted and let that, imagine if you do that, how much trust the business teams will have in your gold standard data, right? So if you orchestrate some of these things and We need to stop thinking of data quality as pure reporting and pure incident management. It should be all about how we orchestrate remediation workflows, plug in your like even the CDC scenario, right? Like constantly replicating data. I haven't met anybody who says that there are not inconsistencies problems between these data, right? So what do we do after that, right? So it's a huge cycle of days, weeks, months. And. Yes, you can get some level of investigation, but can we apply monitoring right there and if there are inconsistencies, do something about it in an automated way. So that's another recommendation that I can make to data practitioners that think about automation, think about orchestration based on SLAs around data quality and data reliability. Yeah, that's, that's excellent. And I wanted to drill into one keyword you use, which, which I hear a lot and, you know, but it means different things to different people, which is decentralization. So in the context of data trust, data quality, observability, how do you think about decentralizing that? So specifically in the context that I was talking, is like, you know, there are still many companies who have their main data set in a central storage, right? Whether it's Redshift or any other snowflake, but it could be a, Delta Lake, Data Lake ecosystem. And then there are different teams which could be taking this data now, how they store manage again, I will not get into the architectural preference of each company that could be a session in itself, but often same data has different product owners. If my security team is taking this data and looking at it, they care about certain metrics of data quality. This PII data and my security teams, which is building any reporting tables or any kind of even machine learning based model. They want to know if the masking logic has failed. right away, like within seconds, right? Or data retention policy. Now, with the same data set from the central stuff about same people, same organization is consumed by the marketing team. They couldn't care less about some of these things, right? What they care about is their segmentation attributes, audience attributes and all of that. So they are all building different products and for each of their products, the SLA's So, data quality metrics, the data quality definitions may differ. And that is what I meant in the context of data quality. How do we still support like distributed kind of ownership of data quality? So does that answer your question, John, or did you have something else? I, I, I think this also ties back to this idea of, you know, rolling your own script where to, you know, tactically solve little, you know Maybe, signs of data quality issues, right? Maybe the, maybe you're, you're addressing the symptoms, but not the root cause. But I think when we talk about decentralizing data quality, like you said, you're really thinking about the product owners and that longterm view of, you know, future proofing. Investments that the, the, the business might make in it, in the analytics and data infrastructure. So, you know, I think this allows a lot of, agility in the business as well, because, you know, companies want to be agile and flexible around the business use cases. They don't want to be tied down by. Infrastructure and it's, it's, it's one of those things that it's, it's not going to be solvable you know with the snap of a finger, but you always hear this from, from the business, like we would, we would do so much more of the, if the technology enabled us to do it. And, you know, I think the legacy implementations are really pulled down by. All this kind of piecemeal script where and things along those lines. So that's why decentralization of, you know, data quality and trust is, is absolutely critical for, for, for companies to move faster and innovate. Yeah. And it's funny enough, because here we are talking about decentralization, but in fact, what we are saying is we need to. Central data quality tool that can connect to all these decentralized pieces off your, puzzle, right? So, if you took talk about the legacy tool, right, most of them were driven by ETL because where are the highest problems, right? It's usually when you are extracting, transform, transforming, and there could be, and so they were almost like working for that ecosystem and they were designed for that ecosystem and specific systems and so on. And even like. Validations that 100 percent agree that a lot of issues are always arising closer to the source, especially the real defaulters, right? Like data did not land itself. So maybe I failed. And like, there were some issues with that. Or your transformation logic had some errors here. Even stream had some issues because of which data didn't move, right? So a lot of actual, like tooling problems or even system problems can arise that. But then there is also this accuracy of the data which gets closer to the business, which could be everything is perfect. Imagine a real scenario where on a unicorn world that everything went through fine, but still the business might come back and saying that the data looks off, right? And this could just be because somewhere in the marketing team, they have rolled a campaign because of which business has suddenly spike and then revenue numbers are looking different campaign. And this is actually a true business signal, which we as team together don't know if it's a data quality problem. Are a true business indicator or a leading or a indicator of a business which we need to kind of take into account. Right. So that's, that's another problem that our industry has that we don't even know which are false positive and what are true positives. Yeah. And absolutely. And yeah, the, the, the data pipeline, like you said, could be completely functional, moving data, running all the jobs that are, you know, in flight and, you know, in the transformation going from. Bronze you know the, the, the gold layers and all that working great. But yeah, data is wrong for, for whatever reason and people end up going back to the source because of that to, to do the validation one way or another, right? They're going directly to the database or the ops platform you know, Salesforce, whatever it is to, to do that comparison. So. And right now it seems like there's, there's no way to get around it, but it looks like, you know, based on the innovation that's going on in the space you know, we're going to have solutions there soon. Yeah. And look, machine learning really helps here. This is one problem that's better solved. And although Telmai is an AI company started with AI first kind of mindset, but look, we always feel like. Not everything can be addressed by machine learning. Some things are better done by rules. Some things are better done by simple distribution analysis statistics. Some things are definitely much more powerful with machine learning and all of that, right? A combination of that is where there is that sweet joy of Solving for many use cases, right? So I'll give you an example in a single schema. If you want to look at distribution of data, which is like categorical data and you have like country codes and how is the well simple distribution analysis doesn't mean any machine learning prediction models and all row count is time series analysis. You may want to put, a proper run. You can predict the threshold, and that's very hard to do with a rule based approach, right? Because you can do checksum and all, but your business is increasing. Row count is doing so at the same time. I don't know, John. You are also in data space for a long enough. You're familiar with data stewards. How they look at data. They spend hours eyeballing data. I have met data stewards who are like, look, Mona, I look at the data and exactly know what's the problem with the data. I've spent 20 years looking at pharma data and I know that this doctor, there's something off, right? How do you bring that and how do you do that at scale? Is it even possible for today's volume? It's almost impossible because everybody's hungry for more and more data. So This is where machine learning and like using high compute scanning the entire data at scale with distributed computing, leveraging a tool like spark will accelerate this path. How can we empower those kind of data stewards or people who are eyeballing the data for accuracy of the data give like a cohort of this is what we suspect could be wrong based on the historical data distribution of the data current data. And this is what we feel looks wrong and that imagine the productivity of that team at scale. So we can solve a lot of these things that you and me just spoke about. Like, how do we know that it's a false positive, true, positive based on like the technology that we have today. So I do feel that we are going to make a very big difference in the coming years in the data quality space jointly, because we have the tools today and we have the technology to kind of support these type of things. Yeah. And I, you know, you, you made such a great point and it's, it's, it really does require a platform that is flexible enough as well for teams to express their own business context. You know, let's say that the, you need to know, for instance, that in, you know, your CRM, you might have an you know, an opportunities object and the real way to. See the, you know, lineage and journey of a, of a customer is really looking at like these three specific fields. Right. But when the ETL tool goes in there, it's going to pull all the fields. Right. And maybe the fields are deprecated and not used anymore. And there's all these types of things where you actually have to know what's going on in the business to, to understand. The best possible reports, right? You can't just have a data engineer look at eyeball the fields and say, Oh, well, you know, these look like, you know, the, the, the right things to capture for, for our business metrics. No, you need someone who actually knows all the things in the, in the operational system. And with databases, it's like endlessly complex as databases. I mean, especially in enterprise. I mean, you're, you're working with maybe 30 years of technical debt and things along those lines. Or even more, right? Like depending on whom you're talking to, I've met people who have hundreds of fields of data and like more data sources than the employees. I've heard all sorts of things. I mean, the lucky ones are the one who are digitally born, like in the era of modern data stack. And they have to worry about the two or three different systems, which are cloud based, but, I wouldn't say unlucky, but the but the challenging part is where most vast majority of our ecosystems stays, which is heterogeneous systems with a lot of different data formats and complexities and decades old data and all the fun things. Yeah, absolutely. And even in that. Scope, you know, understanding the, you know, all the kind of quirks and the, and the legacy attributes of the data is required to build the, the type of business applications and reports, and, you know, , I'm curious to hear if Telmai. Is, is flexible enough to handle those types of conditions. I I, I want to make sure I understood the question. Is it conditions in terms of technology or did I miss that? Maybe you can help me understand. Yeah, let's say I'm a, I'm a data end user and I want to ensure, you know, data quality at a, at a broad level. Am I able to work with my sources and, and any, you know, legacy fields and things along those lines while using your product? Yeah, so. One of the architectural, biggest architectural decision we took, and which, because of which our product took longer to build out than others, was that we did not want to build a SQL based data quality tools, which will work on SQL data sources. The architecture that we have John is that, Take any data source, okay? Most of your audience is coming from data ecosystems, so it's a little bit getting into the weeds here, but take any data source, whether it's even streaming or like a flat file, JSON file, S3 bucket, or your legacy system. Telmai just needs to read from that data and every system has a driver. And what Telmai does is it just reads one and it converts that data into a skinny table push. And then we use our metric, our secret sauce is our metric calculation system. So no matter what underlying system is, we start understanding the data. Our metric calculation system will then extract millions of metrics on the data about distributions, data type. Patterns, formats what is good data, range of the data so that we can predict anomalies and all of that. And those metrics is what we will keep for historic analysis, predictions, and all of that. Now, if, if, if you're with me on how we are architected, then you can imagine that We are able to support any system that has driver to it, right? So we started off with the obvious ones, like we built out like integrations with big query, S3, GCS snowflake, Delta lakes, data lakes, but we also brought in through an OEM partnership. We are now bringing in 250 plus integration. So our user doesn't have to worry about. Will my tool work with legacy system? Because that's where the challenge is. When people are moving data from SAP HANA to BigQuery and they are doing CDC, how do you know that the data did not get missed here, right? We'll do that now because we have metrics of SAP HANA and we have metrics. And by metrics, I say the data quality metrics and distribute, we exactly know that the distribution and SAP HANA and your landing zone of BigQuery is off, right? So we will monitor and report on all of that. So we definitely are natively designed to shift data quality to left closest to the source and support these heterogeneous systems, semi structured data. That's another challenge, right? Think like we are like, we've gone past that, like very structured, like people are using arrays, nested structures and all, had a customer who had an entire JSON into a single schema attribute, right? And we had to support that. How do we read that expand that and understand on the fly? What is the internal structure? And how do you do duplication uniqueness across areas and all of that? So those are the capabilities that will really help. Accelerate the whole and this is exactly goes back to my point. People feel like I can throw in a bunch of validation rules and get this problem solved. But then they get into the Oh, but how do I do it for semi structured data? How do I for event streaming? How do I? I've done it for Oracle. Now, how do I do it for my CRM system? So those are the things that need to be thought through very well when you design something like this. Absolutely. And I love that one phrase you use, which is shifting the data quality left you know, closer to the source. So what's the, what's the value of that? So think about it, right? You got partial data. I, there are so many examples, like 10 percent data records got missed. It goes through transformation. Data has Data is transformative by nature, and you're definitely in the middle of all of that stuff, right? So as the data goes through its journey, it gets augmented, it gets improvised. And if we go back to our medallion architecture example now, if just those missing records get missed, they will go to the next stage. They go to the next stage. Business finds out about it. Now, who knows? What exactly went? It's a snowball effect. Now you have to go use the lineage to track back where where this issue could have arised, especially data missing. How do you even know? Because there are still reports, on the data that has been pushed out to business. Or if you're a data provider pushed out externally and you don't even know at what source this happened. Now the time to resolve business impact is already done. So which is hard to measure. Sometimes people are able to measure and quantify it. But now your data teams, there is that finger pointing because you're under this stress condition that business has found some issues that which team would have gone. Now there are three or four hops of before transformation after transformation and going back and the cost has become exponential of finding the root cause. On the other hand, if you had a proper observability tool like Telmai, which is looking for, hey, historically you've been getting X million records every one hour or every single word or every, there's a predictability, but we have received. 10 percent less, which is detected as a normally you exactly know which source has got what amount of data, what was missed out. And maybe, as I said, it's a true positive, but there is no fire drill here and there's no cost of remediation cost of tracking it back. And even better, which is my favorite thing, is if you had orchestrated, like next best actions in the pipeline. And if you see there is something amiss, you have already automated that bad data going down the downstream. Yeah. And it's very important for, for teams to think like this, right. And have a very deep, intuitive understanding of the upstream data they're working with so that they can better provide value to those who are consuming that data in whatever manner it may be. It could be a report. It could be you know, a reverse ETL back into Salesforce. And there's all this great innovation going on in the market that, you know, like you said, meant allows decentralization. Right into systems of engagement rather than just pulling up a report or, you know, a CSV data dump of data, right? We're so far past that. So it is crucial to have, you know, that that decentralized layer of data quality that does shift left, as you said, look upstream, gives users the ability to put that that business context into the into the raw data, which which may not be. Readily, you know, consumable downstream. Absolutely. I think that's why this problem is so exciting, because there's so many aspects of people processed tools that this, it cuts across. And that's why it was, it is so difficult to solve also, right, in any organization, because shifting left is a different team altogether, and you can shift as much left as you, can, but if you're buying data from third party provider, you can't even go past certain left, right? But at that time, can you apply some systematic checks at your level that this provider is supposed to give a certain, like we talk about data contracts also, the contracts can be done if you have the team, two teams, you have a control or influence on the two teams. In certain cases, it can't even work on a third party data, right? Where you cannot always have an influence on them. Absolutely. Mona, I also want to ask you, you know, there's so much great collaboration and evangelism going on in the industry and education and thought leadership. What are some of the in person events that you're excited about in our industry? I'll tell you what I like. I love meetups. I love meeting 10, 15 people at a time and like having deep conversations. Conversations. So I try to do that. Like there are a few meetings, but I'm also going to be at the Gartner data and analytics event, which is, I feel like, if you want to see scale and if you want to see like what you just spoke about, John , right legacy of 30, 40, when you sit with some, some of these folks who are in like large companies who've been around with hundreds of years, you realize. The challenges they are dealing with is very different from others. So Gartner Data and Analytics is a great place to kind of meet those type of people, have learn a lot and so on. Telmai is going to be at Google Next. We are partners of GCP. We are on Marketplace. We're excited to kind of share our stories and demonstrate Telmai at Google Next. And then, yeah, I mean, those are the couple in Q1 that I'm definitely Data Universe is like, you remember there was used to be big data London and now the same crew is starting data universe, which is, New York, unfortunately conflicts with Google next, but that's, that's a good one I would have liked to go if there was no conflict of dates. Absolutely. Well, well, I will personally be at both the events you mentioned Gartner and and Google cloud next. And I agree with you there. They're great events you know, in Gartner, you get. You know, a lot of great proven technology practitioners who are, who are innovating and at Google Cloud Next, you really get a glimpse into the you know, the, the latest and greatest technology that Google is providing along with partners like Telmai, and of course, you know, where I work at Striim along with many other great companies. So yeah, lots of, lots of exciting in person events coming up and it's, it's going to be a fun, a fun summer and, and spring for, for data practitioners. And I, you know, and, and that's always my advice to data engineers and those who are working in the, in the, in the pipelines is to go out there and meet with your peers and, and network and build those relationships and get those. You know, best practices in person because it really does help further your career. But on that note, it'll be great to connect with you there as well Mona. So thanks for sharing the events that you'll be at and hopefully people can, can meet tell my in person. Awesome. And I look forward to seeing you there, John. It was great chatting with you today. There's always so much to share today. It was more about Telmai. Maybe next time when I meet in person, we'll talk more, much more about many broader topics. Absolutely. Mona, where can people follow along with your work? Try to post a lot on LinkedIn, but again, that's a relative thing. Like, relative to most people, I think I post a lot. I'm always accessible on both LinkedIn email. My email is mona.rakibe@telmai.Ai my linkedin I can throw in my linkedin. Also. It's Mona Rakibe. Very easy to find. Me on linkedin and Telmai website is www dot t e l m dot a i n We try our best to post a lot We could do more on customer case studies and what we are learning from the industries. I just got off call with our marketing We are Hoping to do a lot more now. So look out for that on our blog posts and so on. We write a lot about how to think of machine learning and AI based approach to data quality also. So that's something that folks can read on our website. for sharing. And for the listeners, those links that Mona mentioned will be down in the podcast description. Mona, thank you so much for joining today's episode. It was a really great discussion and we learned a lot in the process. So thank you. Thank you, John. Take care. Have a good one. Absolutely. Thank you to the listeners for tuning in. Okay. Bye. Bye.

Data Observability and AI Startup
Decentralization of Data Quality Automation
Data Quality and Flexible Platforms