A Cost Optimized Data Ecosystem with AI and FinOps Expert Kunal Agarwal

Embark on a journey with Kunal Agarwal, CEO of Unraveled Data, as he unravels the complexities of managing escalating cloud costs with the sharp tools of FinOps and data AI. If you've ever grappled with the challenge of scaling data operations without breaking the bank, this episode is your playbook for turning those daunting costs into a mastered art. Kunal, with his deep expertise in B2B enterprise technology, shares his insights on the inception of Unraveled Data and the crucial role of AI in streamlining cloud data management. It's not just about the tech; it's about the smarts in employing it, and Kunal's tales from the trenches of data observability will guide you through the labyrinth of efficiency and optimization.

Dive into the tech mosaic of today's data platforms, where consistency is king despite the varied landscapes of Databricks, Snowflake, and others. We tackle the nitty-gritty of providing a seamless user experience and the prowess of Unravel's AI-powered insights engine in standardizing performance across systems. The chess game of maximizing ROI from AI investments also takes center stage as we dissect the importance of a cost-conscious culture supported by FinOps. Listen as we dissect the fine balance between innovation and investment, and learn how to wield the double-edged sword of customization versus cost. Kunal's strategic vision paves the way for a future where automation and economic savvy coalesce, propelling data-driven enterprises to new heights.

What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

0:15

Hello, everyone. Thank you for tuning into today's episode of What's New in Data really excited about our guest today. We have Kunal Agarwal, CEO of Unravel Data. Kunal, how are you doing today? Doing great, John. Thank you for having me here. Yeah, absolutely. You know, we've been talking about doing this episode for a few months now, and you're working on some amazing stuff in the FinOps space and bringing data and AI into the picture. So would love to hear your story. First, tell the listeners a bit about yourself. Yeah, thank you, John. Sam Kunal founded a company called Unravel Data where we're focusing on FinOps and data observability for the modern data stack I've always been a B2B enterprise technology guy, you could say that, and the latest and the most exciting technology is obviously the world that we live in around data, machine learning, our AI. But what attracted me to starting this particular company is while we all have these big goals and ambitions of achieving groundbreaking new discoveries and innovation with AI, what really holds us back is making sure that number one, those things actually work properly. The data pipelines, the underlying systems, the code running on top of it and then secondly, as our needs keep increasing with our companies then we're making sure that we're able to scale our operations in an efficient manner. It's not linear, but you do break the cost curve at some point. And those are the two things that we usually see companies that we work with is what's holding them back achieve their A. I. and their data goals, really and the purpose of having unravel is to really level the playing field and simplify these complex operational pieces that hold these companies back. So my background, like I said, always been in B2B enterprise technology. I was working for companies consulting on Oracle systems. Prior to that, I was working at Sun Microsystems doing grid computing, which is, you could call that big data from back in the day. And I met Shivnath, who's my co founder and CTO while we were at Duke University. He was a professor of computer science there and I was running my MBA and what we saw was the need for having a database optimization product that simplified what people are doing with running distributed systems and parallel processing systems, really. Because back in the day, it was just Astro data that could do that, but then when people started moving to, then a new technology is called Hadoop. What we saw was there was a lot of power in using some of these open source distributed computing systems. But it was rough around the edges, and that's where the light bulb moment of, if we can offer users of these platforms a complete product, then it would help them do their tasks in a much more simpler, faster way, which obviously would increase the adoption of these products and technologies. So FinOps for data stacks is definitely a pretty crucial part to making sure that you're meeting your budget goals and ultimately getting the ROI of all your investments there. So tell me a bit about the current state of FinOps for data teams. It can be referred to by different names. It's really a cultural practice within a company. It could be called cloud cost management, it could be called cloud optimization but it's gaining a lot of popularity pretty quickly because companies now have significant investments in the cloud. There's also a FinOps foundation that has defined a framework, which I kind of like. But it's evolving in the form of helping people understand how should they be understanding the cloud costs, how they should be tracking it, how they should be putting guardrails around those kind of cloud spans. But the ultimate intention off. Hey, are we scaling efficiently? Now, FinOps while it's being adopted. And I think there was a quote by Gartner that said 48 out of 50 fortune 50 companies, you know, have some sort of FinOps practice. What we're seeing is it's very immature, especially when it comes to managing data cloud spans. What we're also seeing is what cloud costs are making the biggest I. T. Spend the A. I. And the data costs are the fastest growing costs under that cloud span. So something needs to be done about this quickly as we're seeing this burst of AI activity and needs coming from all industries. It's not one industry that that's isolated in this, where if these data clouds start to become the number of an expense, then you need to start justifying these costs and you need to understand what the R. O. I. of these costs are. I need to understand how do you efficiently scale it? Put guardrails around these different pieces. So our place in the world, when we think about what unravel will do and FinOps is to ultimately make it simple to understand your cost. Ultimately make it simple to adopt the best frameworks for then controlling, optimizing, and growing your data workloads efficiently in cloud. One of the things that I see Unravel Data talk about is purpose built AI. And specifically for FinOps, so tell me about how AI plays a picture and allows you to offer this unique service to data teams. In essence, cloud data cost management or FinOps, right, requires observing what's happening, then putting guardrails. But then not only leaving people with the what is happening, the true value becomes when you start to understand why something is doing what it's doing and how you can make it better. That part is a very difficult, cumbersome, time consuming, expert requiring exercise right now. And that's where we see AI can help. So let me give you an example. So first of all, to even understand what's happening in the world of data, it's a rather complex problem. You need to understand, you need to have granular understanding of Which user is submitting what data pipeline or AI model? How is that impacting the services that it's running on? What kind of infrastructure or serverless environments is it actually consuming? And then what are the shared versus unshared resources between the multiple users that you may have inside your company, right? So that's just to understand what's happening and then be able to allocate cost to who's doing what on the environment. The next question starts to become, Hey, are we doing things efficiently? And that's where understanding what are we doing that is causing wastages or causing inefficient usage of the environment comes into play. Most people think of waste today as primarily just being infrastructure related. What we see is there's plenty of issues that also lie in how the code is written, how the pipelines are executed, and how data is laid out. So there's a lot of other areas besides just infrastructure where you can see these wastages and inefficiencies arise. So what we see is it becomes really hard. It's, while it's a noble exercise to go and figure out, Hey, how do I scale these environments efficiently. It's very, very challenging. Number one, the users and the FinOps practitioners of the data teams, they don't necessarily know what the right thing to do is, they have to do benchmarks to really go and understand even if I'm doing something in green, yellow, red, like how off I am from doing these things. And then once they even pinpoint what I'm doing incorrectly or what is causing wastages, music. Then figuring out what to do to go and solve those problems becomes yet another trial and error exercise, which can take several days, several weeks. It's like, hey, maybe let's try this thing out, and maybe let's try that thing out. And if this thing doesn't work, let's keep experimenting until we get there. But as we know, all of these data projects are moving at breakneck speed. So we need to go and resolve these problems right now. Otherwise, you're wasting all those wasted resources every hour of the day, every day of the week. And then lastly You need to be able to solve these problems before they hairball into a big issue, rather than after, because then you've already incurred the bill, you've already had those wastages. So being able to spot them early and identify the root causes early can help you even prevent those problems in the first place. So those are the areas that we apply AI to which is, can we give you where you should be focusing on? What are the key wastages and improvement areas in your environment? Then once you do select those entities to go and focus on, then how can you solve that problem without you doing trial and error? So giving you pinpointed remediations or optimizations or whatever form, function that particular efficiency improvement takes. And then the third part is making the data teams and data operations teams more proactive in that. Can we avoid these problems from creeping up on your big data or your AI environment in the first place by a couple of different things? Maybe having checks and balances in your dev to production C. I. C. D. pipelines. Maybe when it is running in prod being able to detect anomalous behavior and warning you off these problems that are creeping up in your cluster ahead of time. So you can solve these problems on day 15 instead of date 30th when you build already arrived. And we're seeing some amazing results with customers and what it really, really manifests itself into is just reducing toil. People don't have to spend hours and sometimes we see days and weeks to go and solve these issues and the data teams just love that. Obviously, the companies benefit from lower costs on the cloud, but the data teams are thanking us for, for making sure that they're not spending their time doing what they call these mundane tasks, rather than doing all the exciting stuff of creating amazing data. Absolutely. It's a very powerful concept for data teams to be able to spend less time doing manual tasks around FinOps. And at the same time, ensuring that they're getting better ROI from their workloads through cost optimization. Now, I'm just going to throw an example out there maybe you can tell me how it works. So let's say I'm running a you know, a snowflake service at my company, what are the types of cost optimization recommendations you would make? Great question, right? So on the Snowflake side, let's bucket into two broad areas. One could be at the warehouse level itself, and one could be at the individual queries or the pipelines that you're running through Snowflake. On the warehouse, there are different types of inefficiencies that can be caused from missizing your warehouse not utilizing the large or the extra large warehouses that you created or having too few warehouses could be a problem because you have different types of workloads running on that warehouse. Some require different types of resources and the other or vice versa, having too many warehouses where you could have actually consolidated some of those workloads and put them all in one warehouse. Which could be using the same data and the same resources, you know, for this individual workloads starting and stopping warehouses, thinking about intervals of what kind of idle period is attainable or even tolerable within an enterprise. All those areas are something that Unravel examines and understands automatically for you and guides you to, you should have so many warehouses. Warehouse A should be a medium and not a large or not an extra large. You can combine these 16 warehouses that are running individually into one large warehouse, for example, that could help you even improve performance for for instance. And all of that is done live and constantly updated models so you can take the action as you see fit. The other side is on the code level, or on the query level in the, in the case of Snowflake. Snowflake, because it's being used by different types of users inside the company, it could be a hardcore data scientist, somebody who really understands the ins and outs of the system, all the way up to a business analyst. Who's used Excel before and now has been given the access to Snowflake who may or may not know how these powerful distributed computing data technologies work, which means that the skill set that they have in how they should be running things, quote unquote, efficiently and well varies. We see this way too often, some user will spin up a select star on a massive table and rack up a tens of thousands of dollars bill overnight didn't mean any harm, but it cost the company all this money. For example you could have, you know, users that are not thinking about the types of joints that they're putting together that could create inefficiencies in their code or the order of joints, for example, right? As some of these low hanging fruit items. Now with more and more code being written through AI while it's fast to generate that code, that code is usually not optimized. Because those chat GPTs or any of your co pilots are not really aiming towards efficiency, they're aiming towards getting you the fastest code possible. So all of those code created by humans or by AI have massive rooms for tuning, for improvement, for improving the efficiency. And those again are all the areas that unravel scans consistently on a real time basis and keeps providing you insights around. Hey, this is wrong. Hey, this is inefficient. Hey, this could be faster. Hey, this could be cheaper. And then you can start to take actions on the pieces. Unravel supports many data platforms right between, Snowflake, BigQuery, Databricks EMR. What are some of the differences between managing costs between these platforms? Would you say it's a general problem, or does each platform sort of have its own caveats that teams need to take into account? That's a great question, John. They are both very similar and very different is the short answer. So when you, when you think about. Data systems and platforms. The underlying technologies have to be some sort of a database or a data lake and then you have different types of processing needs on top of that. So you're right, we support a wide variety of platforms from snowflake to data bricks and big query and Amazon redshift and all these other environments. What we, what we try to do is ensure that we are able to provide a similar feel look usability between all these platforms, no matter what the customer is using. So we've tried to standardize on a lot of these different pieces and features that a person would use, but the underlying technologies is already looking for what are the common problems versus what are the problems that take a remediation. And you were supposed to solve them different. I'll give you an example, you could be on Databricks, you could be on Snowflake, and you could be running SQL jobs. Any query that's running could have the similar inefficiencies, for example, like joins so that would be the similar inefficiency that you can see in a variety of systems. The other one would be, hey, the way you think about Snowflake is in warehouses, the way you think about Databricks is in workspaces and clusters, so the tuning parameters or where you can fix things or whether knobs are for for changing things to drive efficiencies could be very different. But because the data is presented to the customer in a way that they understand, hey, there are inefficiencies here, there are slowdowns here, there are rooms for improvement here. We simplify that. In fact, the customers use that as an advantage of we can switch platforms or we can be multi platform in our company and our end users who are data engineers and data scientists would still have that same experience in getting the best out of these platforms because Unravel abstracts that layer out for you. You mentioned one thing, which is, multi platform able to support that. Do you see data teams supporting, multiple platforms and even multiple clouds as a more common use case these days? As sort of one of the shifts that we're seeing would be curious to get your perspective on that as well. Oh, 100%. What we hear in the news is these platforms building all the capabilities so data teams don't have to use the other platforms. One platform is good for AI, one platform is good for SQL, and that's what data teams are choosing. They're choosing the best of breed and putting all these systems together to get the best outcomes that they need. In other cases, you may have very large organizations with different departments and different business units. Who have started their journey on data and AI at different times, and they just end up being on different clouds, just regionally or different platforms so there can be just as organic growth as well, that leads to multiple platforms. At Unravel, we've always betted that there will be no one size fits all, there won't be one platform, one database, one data lake, one programming language, one type of ingestion platform, that will be it for everybody. Because they at doing different things for different scales and for different needs. And that's the beautiful part about how our technology stack. Our industry's technology stack, not just Unravel, is so modular. And I think that's what spurs innovation is, you can focus on one thing and get that really, really good and get that the best in class. And then you can have an option for that particular technology and then the next one rolls around and keeps everybody honest and moves this innovation much faster as well. And that is and becomes one of the advantages of Unravel is our ability to be able to support such a wide variety Of different data systems and leading platforms that customers don't have to think about. Hey, how will I get that quality of service on the other platform? How will I get the best performance and the best cost performance optimizations from this other platform? Because they can bank and say, no matter where I'm going to, I'm always going to have this thing unravel inside to make sure that everything runs properly and everything scales efficiently. Absolutely. What are some of the steps that you've taken to make Unravel but making a data platform, modular and portable between clouds and platforms. At Unravel's core is this AI powered Insights Engine which is built and trained to understand all the intricacies of data pipelines, data algorithms, you know, AI models on all the different platforms that we speak about, right? So. We first got an engine that is built to ingest millions of data points from across the stack. That was, that was perhaps the biggest piece to go and build is connectors to all of these different systems. And then being able to get this data to a central place in a correlated fashion that you can then start to apply all of these insights and AI models on top of, right? It's no different than our customers using their own stacks to solve their data problems. They're probably doing it for customer data and their sales data and combining those pieces together to create some insights. Our data is telemetry data, and we like to understand information about your apps, about your code, about your data sets, about the infrastructure and the services that you're using, and then once we have all of this put together in one place, then we can start to run our insights and algorithms on top of it. So that's what makes it modular, so you have all these different pieces collecting data, depending on the system that you are on. The data model itself is a standardized data model. No matter where you're ingesting this from, it falls into the same way, and now the way you use this data is multiple fold. You may just have dashboards and visuals to go and understand the what you may have, like, An application performance management screen, a pipeline performance management screen, where you can understand the behavior of a pipeline in our workflow, you may have an insight to go and check out charge back and show back for your multi tenant data ecosystem. So those are visuals, which help you drill down, play around with the data, understand and explore what people are doing on these environments. And then you have the algorithms to go and solve a specific class of problems, which we've been training on. Millions and millions of logs from jobs or data pipelines or AI algorithms, right? And what are we ultimately trying to solve for is reliability, making sure that things work as advertised. Things work on time, every time performance, and then things are optimized for efficiency, so that's the cost part of the of the equation. So that's the simplest way to think about it. And then obviously, when you go to different platforms and different types of applications, you may have more intricate algorithms to go and deal with some specific set of problems. That's excellent. And, you know, you had a great article that was published by the Forbes Technology Council as well about optimizing costs as a crucial step to AI and machine learning success. Why is optimizing costs the kind of the prerequisite there? The only thing that's getting funding in budget these days are AI. And people are spending money on, AI, like it's going out of fashion. So the, the prediction that I had in there is people will start asking questions about the ROI of AI. And for that, we need to have an understanding of where are my costs going? And am I spending money in the right way? So that's one of the predictions that we are, that would definitely see this year is people questioning am I getting the bank for the buck? I've put down hundreds of millions of dollars on my AI endeavors or just my data analytics and numbers, what is the return on that investment? And to understand the return investment, you first need to understand, are you spending money the right way? And if there's ways for you to go and optimize that the number two is people actually start to think about AI outcomes as a business endeavor. Chat GPT, for example, or open AI starting to make money from chat GPT. When you think about it from industry to industry, company by company, you've really got to put yourself in that mind frame of these are expensive endeavors. What am I improving in my company's operations or what am I offering new to customers that we can generate new revenue from and then what's the unit cost of generating that incremental amount of revenue? And then being able to get the right ROI for this. So what we would recommend is measure everything, measure early, start to think about where wastages are going, whether that's infrastructure and codes, you're looking at everything end to end. Remember that not all AI models are going to go into production. So discount for the ones that are experimentation, which obviously is a good thing to do. But not everything is going to produce an ROI for yourself. And then really try to understand how do you derive a unit cost so you can compare the return that you're getting for the investments and the time that you're putting into it. It's really critical as these AI initiatives roll out that a, there's some measurable ROI this is something that the business is getting value out of, whether it's either internally or you're optimizing your processes or externally getting revenue from customers in exchange for some AI driven experience. And I think right now most companies are still exploring it there's a lot of analysts reports have come out and most of them are saying that at scale about less than 10 percent of companies have truly adopted AI in a fashion that's really generating R. O. I. In the business, right? Experimentation like you pointed out multiple times can be very expensive, right? The cost of these services can run pretty high, you talk about the observability and having an AI driven tool actually monitor that. So what's your advice to teams that are embarking on their journey from a FinOps and cost optimization perspective? Yeah. So there's three models that people are employing AI with you can either take the off the shelf and run that for your use case you can take the off the shelf models and enhance that for your use case, or you can go and create your own underlying models itself. And as we go from model one to the next, it keeps getting significantly more expensive. Not only to create, but then to manage and maintain and run and expand from there itself. Right. So number one, figure that out. And it goes, with the same wisdom that people have had for adoption of other technologies is start with the end use case in mind. I know people are rushing to it because Wall Street is deciding between the haves and the have nots. If you've got AI or not. And you know, every company is rushing to do something with the AI but what we have seen in the customers that we work very closely with is you start to get a lot of return on investment when you're focusing on the operationalize, just improving operation, just improving operations in your company with AI when we think about AI, we're always thinking about doing some amazing innovation with it. So even starting off with, hey, looking at your company's processes and figuring out what can be automated, what can be improved? Can I improve customer service? Can I improve the way we do sales? Can I improve the way we do marketing? Can I, can I reduce you know, some expenses that are, or wastages that are happening inside my company with the advent of AI? Then those are the first places to go work on where you will get a good ROI. While then investing in some innovation for creating new products and services that don't exist or are going to be an incremental improvement to what you currently are offering. The way to understand if you're getting good return on investment is to measure everything. And when we say everything, it's not only the AI models, but it is the data pipelines that's feeding data to these AI models. It could be 2,3,4,5, steps prior to those AI models on the left and 2, 3, 4 steps after launching those AI models on the right. And you really have to understand that with the technology team to really get the true cost of delivering these AI products out there in the market. So again, the advice would be to measure everything and measure everything early and often enough. So you have a real understanding of where is it that my resources are being spent. And then being, and once you understand that, that's when you can start to see opportunities or where we can actually improve things and optimize things from that. Excellent. And what's your vision for the future of FinOps? Every cloud company is going to adopt a FinOps practice, that's going to be more cultural and it has to be more cultural rather than just a standard set of practices. And what I mean by cultural is, it only works if everybody in the company is aligned to it. And it's not just leadership and it's not just system admins or your cloud architects, but all of your engineers and all of the constituents that actually write applications and pipelines and algorithms on these cloud platforms and systems. So number one, there will be a cost cloud cost consciousness movement to really further the adoption. That's number one and that's starting already, and we're seeing this in companies that started Finops journey a year or two or years ago now starting to settle in and expose the entire company to why they need to do and have these kind of fit ops practices and getting alignment with all the engineers because you need that alignment. You cannot make any changes without shifting left in your organization anyways. The second thing in FinOps is we're going to go from under from just being a let's understand where costs are going to how can we optimize these in a much more automated fashion? And it's a good first step that most of the FinOps companies have. Unravel, we've had the advantage of focusing on automation since the inception of our company. So we've already gone to those places where we go beyond just telling you the what, but also the why and how to go and solve certain problems. But we'll see the entire FinOps practice starting to move towards there. So you can compare and contrast how much are you spending. How much could you be spending if you had optimized things or less wastages and then being able to incentivize your users to take those actions to get you to that desired state. And I think that's going to be the key forfeit option is to align incentives across different teams. Do you have to achieve in that common goal? Excellent. And there's going to be a lot of innovation in this space. And teams are moved very fast to adopt the cloud and enterprises moved gracefully to adopt cloud and now multiple clouds, very common. Like you mentioned multi platform multi cloud is almost the default I see with large enterprise customers now. So having a good understanding of FinOps across platforms. And continued evolution and innovation coming to the market's going to be absolutely critical. So I'm glad that we can follow along with your work since you're a thought leader in that space. That being said, Kunal, where should the listeners follow along with your writing? Yeah, John can go to unraveldata. com. And you'll, you'll find a ton of resources, a ton of information about how companies are creating best practices. Like you said, we're in the early innings so there are people that have tried and tested certain things that we can learn from and peek over their shoulders to see what worked for them and what may work for us. So we share those stories as often as we can find and publish them and drive success with our customers. But then it also gives you ideas of how should you go about implementing something, depending on where you are in the journey. If you're absolutely early, where do you get started without being overwhelmed? If you are, if you already got some solutions in place, how can you supercharge the FinOps journey for your entire team? And there's ideas about just building up the cultural aspect of it, for example, right? And not really technology related stuff. So you'll find all of those and more at our website, which is unraveldata. com. Kunal Agarwal, CEO of Unravel Data, co founder as well, it was great having you on the podcast today. And we'll have a links to unraveled data. com and your LinkedIn bio in the show notes for those who want to follow along with your story. Thanks so much for joining today. And thank you to all the listeners who tuned in. Thank you, John. This was fun.

What's New In Data

A Cost Optimized Data Ecosystem with AI and FinOps Expert Kunal Agarwal

Listen to this podcast on