Informonster Podcast

Episode 31: Data Quality in Healthcare: Decoding the PIQI Framework

Clinical Architecture Episode 31

Tune in as Charlie Harp unveils the rationale behind the Patient Information Quality Improvement’s (PIQI) streamlined data model, and explains how it turns complex patient information into an organized format for analysis. The next episode will cover the taxonomy that was used in the PIQI framework to identify the root causes of data issues and provide recommendations for improvement.

Contact Clinical Architecture

• Tweet us at @ClinicalArch
• Follow us on LinkedIn and Facebook
• Email us at informonster@clinicalarchitecture.com

Thanks for listening!

Hi, I'm Charlie Harp and this is the Informonster podcast. On this episode of the Informonster podcast, I'm going to continue my series on the PIQI framework, the Patient Information Quality Improvement framework. So if you haven't already listened to the first episode where we talk about the PIQI framework kind of in general at a higher level, you should probably drop everything and go do that right now. Today's episode we're going to talk about the data model that we're using for the PIQI framework, kind of why we chose what we chose, some of the basic parts of it. I'm not going to get into the detail. I'm not going to read through a list of elements and attributes, so if you're into that, my apologies. But for most of you, you're welcome. So let's go ahead and get started. The first question is why did we create yet another model for patient information? 

And the truth is I don't know that we created another model for PIQI. What we did was we kind of pull things out of most of the models that already exist. When you think about what the PIQI model is, it's a simple hierarchical model that has a patient and then a bunch of elements organized into data classes. And each data class has its own record attributes that kind of describe the thing in that data class. So drugs have dose amounts and dose units and drugs and labs have lab results and lab values and reference ranges. What it is, it's almost like a distillation of the core patient information that lives in just about any EHR systems data model in any message model. And we just kind of made it about the data. So we are not imposing a bunch of rules, there's not a lot of complexity. 

It really is boiling away all the schema and all the syntax and just making it a very simple organized list of attributes. And once again, the reason we did that is because we're not trying to transform the data. We're not trying to do anything magical with the data. We're just assessing the data so we can boil out these core attributes for each element. We can evaluate those attributes and we can say, Hey, these were good. These weren't good overall, your score was this. So that's why we did this. And the nice thing about the PIQI information model is it should be fairly easy to extract the data from almost any format or any EHR schema. That's kind of the why. The other reason is anybody knows that if you tie something to a standard, and I'm not slamming standards, standards are great, but if you tie something to a standards, standards change and it's outside of your control. 

And so my hope is if standards change and they introduce new attributes in those standards and those become ubiquitous, then we can always add that into the PIQI model. The PIQI model is not sacred. It's really designed just to collect those attributes that I want to evaluate and see if they meet my criteria. So when we start talking about the actual structure and you look at what's in these elements, one of the things I wanted to do in this session, because I know people who are not in health informatics who listen to this podcast, I do not know why they do that. But what I'm going to try to do now is give a little bit of an explanation about some of the things that we're doing when we pull things into this model. We in the informatics field, throw around terms like "transitive closure" and "pre and post-coordination", and sometimes we get wrapped up in our own terminology as ironic as that is. 

But there's nothing magical about patient data. The things that we move around in patient data all boil down to the data types we see all over the place. You've got dates, you've got integers, you've got decimal values, you've got strings of text, you've got a fixed list of things we call those enumerations. These are all things that are used all the time. And anybody could look at a list in an Excel spreadsheet and say, oh yeah, that's a date, it's a time. That's a number, that's text. There is one thing that is a computer sciencey informatics thing that most people roaming the earth do not encounter on a regular basis, and that's a code. And so one of the things I wanted to explain to the people that don't understand this is what a code is and why it's important in healthcare data and why it's a little more complicated than most people give it credit for. 

If you think about the way computers operate, computers aren't great at language and even with the event of AI and NLP, it's still kind of a fuzzy way to operate. There's still uncertainty when you ask a computer to deal with language. If you ask a computer to deal with a code, they're very efficient when it comes to dealing with codes. And you can kind of think of code as being like a concrete pillar of support. If the code is right and you give the computer the code, the computer can work wonders with a code. So a lot of the stuff we do and we're pushing information around and we're trying to do analytics or things in high volumes, we use codes to make it super easy for the computer to make decisions and do what it does best. But a code is not just as simple as a code. 

Like you might say, well, a code is a number, so number 10, I have a code that's a 10. That's true. And that 10 means something to the computer. Let's say that the 10 is the code for asthma. So if the computer sees a code 10 for Fred Jones, it says, Hey, Fred Jones has a code 10. Code 10 is asthma and it can find all the people in the database with a code 10, and these are all your asthma patients. And it does that like grease lightning. The issue is a code doesn't exist in a vac. If we want to take an analogy for a code, let's take human beings as if they're terms or concepts. And let's say that you as a human being have a code and your code is your social security number. If you're here in the US, if you were to go into the Social Security office into a government building and they were to say, who are you? 

You would say, I'm Charlie Harp. And they could ask you, what is your code? And you say, my code, I'm not going to read my social security number. So let's say it's 1, 2, 3, 4, 5, 7, 6, 8, 9, and they say, oh, you're right, you're Charlie Harp because the social security number that you gave us, the code matches who you semantically said you were Charlie Harp. Now let's say I walk into that building and they say, who are you? And I say, I'm Charlie Harp. And they say, what's your code? And I read them my driver's license number for the state of Indiana, and I say, my code is 4 5 7 1 9 8 3 1 1. By the way, it's not my real driver's license number. They could say, ah, you're Steve Jones, you're not Charlie Harp. Why are you lying to me? Or they would start treating me like Steve Jones. Steve Jones is a troublemaker. 

So we're going to haul you off. That difference in where the code came from and what the code is called is something that we use in informatics all the time, and it's something that is missing a lot. And that's something that we call the code system. And you can think of the code system as the dictionary where the number came from. So somewhere when somebody came up with a code of asthma, they went to a dictionary and they said, I need a code. And that dictionary, that code system, let's say it's SNOMED, said your code will be this. And it is a source of record for the fact that that code of 10 means asthma. That concept called the code system is critically important when we're exchanging data. And actually unless you have a ubiquitous code system in the sky, that's always important because when I'm giving you a code, I'm really giving you three things to be valid. 

I have to give you three things I have to give you code system. Here's where the code came from, I give you the number, it's number 10, and I'm saying it's asthma. Now I'm going to talk about this in a little more detail because I think that when it comes to patient data, this is where we fail the biggest. Is when it comes to the code. I can't just give you the code and the code system. Now there might be people out there that argue, Charlie, if you give me the code system and you give me the code, that's all I need to coordinate what that thing is because I'll go to the code system, I'll look up 10 and I'll say, this is asthma and that's great. But what if somehow in the system of record where the data came from, they got their wires crossed and somebody was able to override the name of asthma with heart failure. 

And what if I get that code and I get the code system? Is this, the code is 10, but the term itself is semantically different than what I would expect for that code 10 in that code system. What happens if that happens is the source thinks that 10 is heart failure for whatever messed up reason. When I give it to you, you automatically say, "Nope, that's asthma". It's why the code is like a holy trinity of data. You must have a code system, you must have a code and you must have a term so that you can validate that the code and the code system mean what they think it means. The reason I'm hammering this is because when you think about the model that we're talking about, that concept, the codes, every single element, pretty much every single element has a code or more than one code that it's sending as a payload. 

And when you look at how we share data, when you look at interoperability, when you look at all these things we're trying to do, the code is important. And in the PIQI information model, when we deal with codes, there's a whole set of special rules and evaluations that happen with a code that aren't going to happen necessarily with other types of data because other types of data like an integer or a decimal or a date, you're just saying whether it's valid. And sure, you can say that this decimal value is way out of whack if you do plausibility checking on it. Once you have the code, you can say that, oh, this date should not happen in the future once you have the code that goes with the date. So the codes are these anchors into reality and we have to make when we're dealing with things that we know what's important. 

So I'm done. I'm going to stop. I'm going to climb down off of my code complete soapbox and wrap it up because like I said, I'm not going to read through all the elements. But you know what I will do? I will do this. Let's go through the list of elements that are on this first draft version of the PIQI framework so that you guys have an understanding of what goes into this. So obviously you have a patient and the patient is kind of a container of the information. And when you're doing a data quality assessment, you do not need anything that would be considered to be PHI. Now granted, you could say we want to validate the social security number or this or that, but in reality, I think PIQI works better if you do not pass PHI into the model. The only thing it has that's close to PHI is a date of birth just to make sure the data is valid. 

You could exclude that if you want and just not include it in the way you're checking the data with the evaluation criteria. But what you really have is you have the patient, which is a container. You have a record of demographic data, which includes the usual suspects, and then you have medications, which are the medications that the patient is currently taking or the medication history that's being passed. You have lab results, which is a collection of results, and they have the usual things that you would expect. You have conditions which could either be diagnoses or problems. You have procedures, you have vital signs, you have allergies and intolerances, you have immunizations, you have medical devices and patient general statuses like are they pregnant? Are they smoking with values? And that shouldn't be a surprise to anybody. Those are all US core and things that are contemplated in USCDI. And 

That's really the driver for the things that we're checking is USCDI. Now you can go beyond the basics of USCDI when you're scoring something, but it's really designed to drive around those things that we are being asked to make sure of high quality. In the next episode of the Informonster podcast, what we're going to talk about is the taxonomy, and that is what allows us to kind of look at what might be the root cause of the data, what are the nature of the failures of the data? Because ultimately with something like PIQI framework, you want to do a couple of things. You want to score the data coming from a particular source, but you also want to have the tools to be able to tell them what they need to do to make it better. Because it's not about, oh, I caught you. You have bad data. It's really about we found issues with your data and here's how you could probably make it a lot better. And it's kind of the combination of those things that will allow us to have this kind of iterative incremental improvement over time and get to the point where we've all got beautiful, amazing data. That's the dream. And with that, I am Charlie Harp, and this has been another episode of the Informonster Podcast. Thank you so much for listening, and we'll do it again in the not too distant future.