The Dashboard Effect

Laying the Groundwork for Gen AI in Business Analytics

Brick Thompson, Jon Thompson, Caleb Ochs Episode 122

Brick and Caleb explore the steps businesses must take to prepare their data for the inevitable integration of generative AI analytics. They use this episode to explain the "why" behind data consolidation and semantic layers. 

What you'll learn:

  • Why consolidating your data is the first step for AI-readiness
  • The challenges and solutions for dealing with data from disparate sources
  • The importance of a semantic layer in making data accessible for AI and business users
  • Why you should take an agile approach when developing data infrastructure

Click here to watch this episode on our YouTube channel.

Blue Margin increases enterprise value for PE-backed, mid-market companies by serving as their fractional data team. We advise on, build, and manage data platforms. Our strategy, proven with over 300 companies to-date, expands multiples through data transformation, as presented in our book, The Dashboard Effect.

Subscribe here to get more episodes of The Dashboard Effect podcast on your favorite podcast app.

Visit Blue Margin's library of additional BI resources.

Brick Thompson:

Welcome to The Dashboard Effect Podcast. I'm Brick Thompson.

Caleb Ochs:

I'm Caleb Ochs.

Brick Thompson:

How's it going, Caleb?

Caleb Ochs:

Pretty good.

Brick Thompson:

So we wanted to talk today about something we've talked about in the past, which is getting ready for generative AI analytics. We're not there yet. We don't know exactly when it's gonna get there. There's still advances. But we think it's important for businesses to get their data act together so that they're ready when it comes.

Caleb Ochs:

Yeah, we said it a few times. You don't want to get caught with your pants down, I guess.

Brick Thompson:

So I think, I mean, there's some of this is gonna be a little elementary for some of our listeners. But some who haven't thought about this, I want to just make sure we cover it, maybe we'll do this in a couple short episodes. But the first place that people need to start and understand is consolidation of data. And by this, I mean that you're not expecting whatever tools you're using, whether it's a generative AI tool, or a Power BI, to go out and connect to all of your different data sources (your ERP or CRM, your timekeeping, your whatever you may have), but rather pull that data together, so that it's all in one place. And I just wanted to talk for a minute about why that's important.

Caleb Ochs:

Yeah, you know, when Power BI came out, and some of the other tools like click or Tableau, that was one of the main selling points is like, "we can connect to a bunch of different sources." So the challenge, I guess, the point of that is the challenges of holding data from disparate sources has always been around. And there have been attempts at solving it in Power BI. And you know, those tools do a pretty good job until you start doing real complicated things, which is exactly the same reason why you need to have a separate medium for even feeding, like Power BI. You want to feed Power BI from a single source in almost every case, because once you start getting into complicated logic, Power BI itself is going to fall down and not be able to do everything you need it to do. I think the same thing, very analogous to what you're talking about with with AI, like, it could probably do that. But there's gonna be more limitations, much more limitations, if it has to go do that, rather than just go to one place where it's kind of cleaned up already for the AI to look at. You're just gonna have a better outcome.

Brick Thompson:

Yeah, if you're connecting whatever system you're using to multiple sources, it's going to be difficult to do the modeling, let's say you're using Power BI, do the modeling so that you're making the different sources line up well with each other. Maybe as a simple way to say it so that you can actually do queries and ask questions and have visualizations that span across multiple data sources. Well, you can do it with all those connections, but it's going to be harder. If the only thing you're connecting on say is a date ID, okay, that's pretty easy. But if you've got ERP data, and you've got CRM data, they both have customer names in them, you're going to have to have those line up if you want to have reporting that deals with both of the sets of data that come out of there. And in order to do that, you're going to need to consolidate it, so that you can do whatever transforms you need to do on the data or create whatever views you need to for the reporting to work well.

Caleb Ochs:

Exactly. Right.

Brick Thompson:

All right. So so that's where you start. And that's easier now than it's ever been. Data lakes make it really simple. You can consolidate your data without having to do every bit of organization that you might have had to do, say five or 10 years ago, when you were building up a Kimball-style SQL Server database.

Caleb Ochs:

Yeah, I mean, I think that's important to call out. You don't have to do everything you just mentioned with like customer names to customer names right away. Step one is like getting into the same place because then it makes that process of making customer names match or doing that cleanup work easier on that people have to do it. If you have to go query of spot for your customer names, and then you got to query your ERP for your customers and you have to pull that together in Excel and then do all your stuff. Then you've got to go do that again once you've made the changes. That's kind of a pain and it slows everything down, so if you can put it into a single spot, so you can start doing some of that more in an automated fashion. And then you can start more of the cleaning and the quality work

Brick Thompson:

As needed. You don't have to do it all at once. You do it as you have a an area that you want to explore. Okay, you do that.

Caleb Ochs:

Right, right. I mean, that's actually a mistake to try and do it all at once. Because it's gonna be too much work, and you're just gonna probably give up.

Brick Thompson:

Well, and you're gonna miss because by the time you finished the requirements or what you needed will have changed. And this I mean, you know, 10 years ago the knock on data warehouses was okay, first of all, it took twice as long as I thought. It took 18 months, and then by the time we finished it, the business had changed and the requirements that we came up with. And so you want to be a lot more agile than that. Alright, so after you get the data into a data lake, you've got it all consolidated, you need the next step to enable tools, software tools, generative AI, or even your analysts to be able to find the data they want to. And you do that by creating this is gonna sound complicated, but creating a semantic model or a semantic layer. See if you can explain that simply.

Caleb Ochs:

So the easy way to think about it is... Let's just take an ERP system. It's going to have tables and objects behind it that will be just like your raw customer table. Then maybe your Invoice Header, and your Invoice Detail, and GL table, stuff like that. Sometimes those are named intuitively, sometimes they aren't. Sometimes they're nicely laid out for analytics, and sometimes they aren't. But basically, what you want to do is make those tables intuitive and understandable and put them into what I would call it like a dimensional model with dimensions and facts. And that just kind of is almost a metadata layer on top of your data lake. So when people go to look at your data lake, it makes sense to them. They're not just seeing a bunch of raw tables.

Brick Thompson:

They're not having to deal with some obscurely named table. Like you might see in a system like Sage. I think Sage where they'd have a letter and several numbers to name a table.

Caleb Ochs:

Yeah, there's yeah, there's a few systems that are like that.

Brick Thompson:

Yeah. Or dealing with hundreds of tables and trying to figure out what you need. Your your BI people, your data engineer should build you a semantic layer that greatly simplifies that, so that you're looking at a handful or maybe 10s of tables total. And then all that wiring back into the data is sort of behind the scenes - the user and the system doesn't have to deal with that. Power BI doesn't have to know the wiring, it just needs to know that interface layer.

Caleb Ochs:

Exactly. Right.

Brick Thompson:

What that means okay, so that sounds like it might be complicated. And it could be, but again, you can do this in a very iterative manner. You can say, Okay, I just need a report on invoice totals. Okay, I can ask my data engineers to build me a really simple semantic model, that gives me just that, or it's a subset of a model that already exists. You can also have lots of different models for the same consolidated store of data. So the Sales department may use a different model than, say, the Finance department uses or the Ops department uses. And so that semantic model, that semantic layer, is critical in getting to the point where we think that generative AI will be able to easily give good answers to your business.

Caleb Ochs:

Yeah, and by the way, it's really helpful just for your business in general, if you have that. Right now. You don't have to wait for AI.

Brick Thompson:

Right, right. Well, in fact, once you have that, just what we've described, you can start doing some sort of good old fashioned AI, like machine learning stuff. Or point your data scientists at it, and have them you know, go nuts with Python and sort of figure things out in there. Or do predictive things, you know, human-generated predictive analytics, that type of thing.

Caleb Ochs:

Yeah, exactly. Right.

Brick Thompson:

All right. So we're going to try to keep these short. So we'll go ahead and sign off on this and we'll come back in our next episode and talk about what you do after that

Caleb Ochs:

Sounds good.