The Dashboard Effect

The Layman's Guide to Data Terminology: Part Two

Brick Thompson, Jon Thompson, Caleb Ochs Episode 120

This episode is the second installment of Brick and Caleb's series on breaking down data terminology. They cover Kimball data warehouse methodology, fact/dimensional tables, star vs. snowflake schemas, and data models vs. semantic layers. 

Click here to watch this episode on our YouTube channel.

Blue Margin increases enterprise value for PE-backed, mid-market companies by serving as their fractional data team. We advise on, build, and manage data platforms. Our strategy, proven with over 300 companies to-date, expands multiples through data transformation, as presented in our book, The Dashboard Effect.

Subscribe here to get more episodes of The Dashboard Effect podcast on your favorite podcast app.


Visit Blue Margin's library of additional BI resources.

Brick Thompson:

Hey welcome to The Dashboard Effect Podcast. I'm Brick Thompson.

Caleb Ochs:

And I'm Caleb Ochs.

Brick Thompson:

Caleb, for this episode, we just wanted to continue defining terms that we use all the time in this podcast that a lot of our listeners will know about, but some may not. And we think it'll be useful.

Caleb Ochs:

Yeah, let's do it.

Brick Thompson:

All right, there's a term we use now and then because we used to build all of our data warehouses this way for a decade or more, which is "Kimball style data warehouse". Ralph Kimball. So tell us about what does it mean when someone says Kimball style?

Caleb Ochs:

So Kimball, is kind of the pioneer behind facts and dimensions, like that type of thinking. So what that means is, there's another term that we should all have a voice, but we

need to put on there:

dimensional modeling. So, and when we refer to something as dimensional model, it's, you have "dimensions", you have"facts", which we'll get into a second. But it's essentially a way of organizing your data, doing that denormalization of pulling tables together into a format that Kimble kind of created, right.

Brick Thompson:

As sort of a standard that helps you get to the right level of denormalization, for efficiency in terms of operations, but also efficiency for users to be able to understand it. So you talked about fact and dimension tables. A "fact table" would be one that holds what we would think of as a transaction, like a sales transaction, a "dimension table" would be a thing that holds descriptors for that

Caleb Ochs:

That's right. Yeah. I mean, one of the main reasons transaction. So the sales transaction might have a customer ID associated with it. And then you might have a dimension table that has all of the customer data in it. Still, denormalized, you bring all the address and all that in there. But you don't want to have to repeat all of that data, even though you're denormalizing. You don't want to repeat all of it in the transaction -- the fact table. why you wouldn't want to do that is that you might have more than one fact table, right? So you just want to have your customer ID there. So you can use in your visualization tool, you can click on like, "Okay, I just wanna see this customer and see information from both of those fact tables - so both transactional tables - without having to choose a customer in Table A and a customer in Table B.

Brick Thompson:

Okay, so that just made me think of a definition we didn't write down. But so, "dimensional model" is basically a model that has these facts. And let's see, let's start again. So what's the dimensional model?

Caleb Ochs:

So a "dimensional model". The easiest way I think about it is there are dimensions. So dimensional dimensions, and then facts. So dimensional model is you've essentially taken, you've got a fact table and multiple dimensions, kind of surrounding your fact table.

Brick Thompson:

Yeah. Okay. All right. So next thing, let's talk about what is a data model? I mean, I think we've talked about a dimensional model. But what is a "model"?

Caleb Ochs:

Yeah. That's a good question. (So that actually makes me think of something that's further separated. But these things are kind of used interchangeably. So I don't think it's worth going into the the specifics of the differences.) So if we just talk about a data model, that is typically... it's set up in a way that shows you the relationships between tables, so as you were describing your transaction with a customer on it - you'd have your customer ID and your fact table - but then it's connected to your customer table. There's something else we'll talk about a second called a relationship that pulls that makes that connection. So you can do analysis across the two? So that is what we would refer to as a data model is like a way of organizing that data and having the relationships defined between the tables. So you can actually do something with it.

Brick Thompson:

So "model" is the same as a "data model". And it might be dimensional, might not.

Caleb Ochs:

Yeah, right. Right. Exactly. Technically, you could have a single table data model.

Brick Thompson:

Right. Yeah. Okay. All right. So what's a"relationship" then?

Caleb Ochs:

So that relationship is really what just what I described. Where you have a customer table, a transaction table, you have a customer ID on your transaction table, and then in your customer table, you have a customer ID. And there's a connection, essentially a join between the two tables and we would call that a relationship.

Brick Thompson:

Okay. And so that allows the system the SQL system to know how to connect these tables. All right. So that leads me to then what is a "star schema"? Which is basically a type of a model. We might be going in circles here.

Caleb Ochs:

So star schema... I think the best way to talk about a star schema is to talk about a star schema and a "snowflake schema" together.

Brick Thompson:

Yeah, okay, good.

Caleb Ochs:

So you've got, just picture this. So you have a transaction table, and you have a customer table, and then maybe you have a date table. Now, you probably have a couple other tables. But if you have your transaction table in the middle of your screen, and you've got a customer table, kind of up and to the right, you've got your date table up into the left, and then maybe you have a couple, there's like an item table down to the right. And maybe something else down to the left, it starts to start out kind of look like a star. Right? So you've got your single fact table, and then you've kind of got your prongs of your star as your dimensions.

Brick Thompson:

Dimensions. Yeah.

Caleb Ochs:

So as soon as you add another table, let's say you have like a, like a customer address. So you've got your address table, and let's say you just connect it to your customer tables. So you've got a customer ID on your address, or maybe an address ID on your customer table, and it connects to your address table. Now you have kind of two steps removed from your transaction table. As soon as you do that, that's what's called a snowflake. So now you're now you're working off a snowflake model, not a star schema model. Yeah.

Brick Thompson:

Okay. I'm calling back to an earlier definition, a "star schema model" is the one that's very denormalized. We've pulled all the data in from all the tables to a few that are fitting into this dimensional model. And a"snowflake schema" is one that's normalized, meaning we're breaking the data up into lots of different table.

Caleb Ochs:

Right, exactly.

Brick Thompson:

Okay, great. Okay, so we've been talking about models. In our podcasts the last few weeks, we've talked a lot about "semantic layers" and "semantic models". Is there any difference between those and what does that mean?

Caleb Ochs:

The best way I can describe that semantic model versus a data model is (at least in our world), I think they are very, basically the same thing. However, I would call this semantic model, at least in our architecture, really more the queries and the tables that are denormalized, before they actually hit your tool where you're going to actually create real relationships between tables.

Brick Thompson:

A star schema.

Caleb Ochs:

Right, right. So like, when you have a relationship between customer and transaction table, you make that relationship actually work real time in a tool like Power BI inside of Power BI, right. So we would typically call the model, something that's in Power BI. The semantic layer, though, is something that is consumed by Power BI. So you would consume your semantic layer with Power BI, and then build the model inside of Power BI by creating relationships.

Brick Thompson:

Okay, so the semantic layer or semantic model, which makes it confusing to use model again, but is just that layer on top of a data lake that allows Power BI to sort of see a "table view" of the data that's sitting in the data lake, then within Power BI, you're going to actually create your reporting data model, your star schema. Well said. Okay, I hope that made sense. This seems confusing. While we're on that, we probably should have covered this earlier. What's a "view"?

Caleb Ochs:

So a "view" stores code. It stores your T-SQL code that selects data from a table. So Power BI allows you to just grab data straight from a table, but essentially all that's doing is it's generating a SQL query against that table, right? So a view is, basically you're just taking out that automation from Power BI, which is what you want to do. Because you might want to do other things than what Power BI generates when you're pulling data from that database. So a view is just stored SQL.

Brick Thompson:

it's sort of an in between the raw tables and where you're going to go with the reporting data model. You just have a shortcut, which is this stored SQL that creates a view. It looks like a table, but it's not a physical table.

Caleb Ochs:

Yeah.

Brick Thompson:

But that, but then when you get to data lake, the whole semantic layer is really "views"... sort of. I mean, we don't refer to them that way.

Caleb Ochs:

Yeah. I mean, technically they are right. They for sure are. I mean, really, it's a view of some underlying data. So we've always written views on top of even our denormalized data warehouse tables. It just keeps you nice and flexible. So going to a data lake model with a semantic layer on top, it's really the exact same thought process. Yeah. And they're both views.

Brick Thompson:

Yeah. So in our old data warehouses sitting on SQL Server, we'd have raw tables. We would create a whole layer of views, reporting views. And we're doing the same thing in a data lake, but we're calling that now a semantic layer - the views. I think we're out of time. So I think we'll wrap this for now, and we'll come back and do another episode of this if people find it useful.