Informonster Podcast

Episode 32: Data Quality in Healthcare: Building the PIQI Taxonomy

Clinical Architecture Episode 32

In the third installment of the Data Quality in Healthcare series, Charlie Harp explores the taxonomy of the PIQI Framework. He breaks down the primary categories of 1.) Delivery and 2.) Intrinsic Issues, and shows how the subcategories a.) availability b.) accuracy c.) conformity and d.) plausibility are used to analyze and address data quality challenges.

Come back for part four where Charlie talks about Simple Assessment Modules (SAMs) and the elegant simplicity of recursive failure. 

Contact Clinical Architecture

• Tweet us at @ClinicalArch
• Follow us on LinkedIn and Facebook
• Email us at informonster@clinicalarchitecture.com

Thanks for listening!

Charlie Harp (00:02):

Hi, I am Charlie Harp and this is the Informonster Podcast. On this episode of the Informonster Podcast, I'm going to talk about the PIQI taxonomy. This is part three of a five part series. If you haven't heard the first two parts, then you should probably drop everything and go listen to them right now. So you might be asking yourself, "why do we need a taxonomy for healthcare data quality?" Well, it just so happens that humans have been trying to create, order, understand, and organize their world for millennia. In the archaic period, the ancient Greeks believed that illnesses were "divine punishments", that healing was a "gift from the gods". It was all mystery and magic. Enter the Great Hippocrates, father of medicine, and the first person to believe that diseases were caused naturally. In his Hippocratic Corpus, he classified diseases based upon observations of symptoms and natural causes rather than supernatural blathering.

(01:01):

He was followed by Aristotle who created a great work of philosophy called "The Categories"- where he places everything under one of 10 categories or "praedicamenta". It is interesting to note that the root of the word "category" is the Greek word meaning to "accuse or assert." Now, Aristotle was followed by others over the years, but the person typically considered to have created the first taxonomy was Carl Linnaeus. He published the Sistema Naturali in 1735 where he formally classified 4,400 species of animals and 7,700 species of plants. He described his contribution to science as God created, but Linnaeus organized. In 1763, French physician and botanist, Dr. Francis Boser Des de LaCroix, that's a mouthful, developed a categorization of 10 distinct classes of diseases, organizing 2,400 unique diseases within them. At the first International Statistical Congress in Brussels, they commissioned the development of a system for classifying causes of mortality in 1853.

(02:04):

This was the genesis of the first "International List of Causes of Death", which was established by Paris physician Jacques Bertillon in 1893, which is the precursor of the ICD classification system that we know and "love?", right today. So here we are roughly twenty-four hundred and twenty-four years after Hippocrates, and we are steeped in the mystery and superstition that is healthcare data quality. We ask ourselves, "why is our data bad? Maybe we're being punished. Maybe there's a magical solution that will just fix it without us having to do anything?" No, dear listeners, I stand here on the shoulders of great categorists and taxonomists who came before me and say, bad things happen to our data for reasons that can be explained and remedied.

(03:00):

We just need to take the time and apply the energy to understand them, and that's why we need a taxonomy for healthcare data quality. So let's take the next handful of minutes or so to talk about it. When developing a taxonomy, it's important to consider its purpose. In general, a taxonomy is a tool that helps you organize and index something. In this case, the something is qualitative issues in patient information. So to determine the nature of the taxonomy we need, we need to understand the nature of the focus of our taxonomy. You also have to decide if the taxonomy is based on predication or universal quantification. Predication is the act of grouping things based on some characteristic like Charlie as a podcaster, whereas universal quantification is grouping things on the basis that they're subtypes of other things like Charlie, as a human. Considering the non-biological nature of our focus, our taxonomy will be based on predication, so we're going to use characteristics to organize and define what we're talking about.

(04:05):

And so let's begin by talking about the characteristics we'll use to organize our taxonomy. At the very core of our focus is patient information. We're not just generally looking at qualitative issues, we're looking at qualitative issues in patient information, and that patient information we've organized in our PIQI data model that we talked about in the last podcast episode. Now, our patient information model has two distinct perspectives when you think about the data that's in it. The first is as a data model with a general set of rules. Now it's a simple hierarchical structure with data classes that contain clinical elements. Each element is an information model with relevant attributes that are either simple data types like numbers, dates, and text, a codable concept, which is prevalent in healthcare, or a flexible data type, like an observation value or a range value that can represent data as text or as structure.

(05:04):

The second part of our model conceptually is it is a clinically oriented construct while the data model with its rules and structure is an important part of the quality of the data, so is the intrinsic plausibility and semantic credibility of the information that exists in that model. This means that the root of our taxonomy has two primary categories. Things fall into delivery issues and intrinsic issues. Delivery issues are those issues resulting from non-compliance relative to the needs of the information model. And intrinsic issues are those resulting from compliant data that violates the contextual credibility of the information. Now, if the purpose of our taxonomy is to shed light on the nature of qualitative issues in the data so we can provide feedback and help resolve them, when you consider what can go wrong from a data delivery perspective, it generally rolls into two broad categories.

(06:04):

Availability, the data is unavailable in some way and accuracy, the data is there, but it's not valid. From an intrinsic perspective, with data information, we have two things that significantly impact the usability and credibility of the data. One is Conformance, the credibility of all these coded concepts that we need in clinical computing and Plausibility the contextual integrity of the data that's been entered into the patient's record.

(06:32):

Let's talk about delivery issues first. Now, once again, these are the things that can go wrong with the data from a structural perspective, and this could mean that the underlying data might be fine, but the delivery is flawed in some way. The first subcategory of delivery issues, as I said earlier, is availability. Availability has three dimensions that represent data not being available. The first is unpopulated, and these are attributes that we need that are not there.

(07:01):

The second is incomplete, and that's where you have some attributes that are unpopulated affecting something larger, resulting in that larger thing being incomplete. And the last is missing, and that's something that we expect to be there is just not there. And that could be anything from an entire element class, a specific element we're looking for, or something that the rest of the data implies should be there but is not. That's the first subcategory of delivery issues. The second is accuracy, and this also has three dimensions that represent data that are not valid. The first is an invalid value, and that's where attributes are incorrect. It could be an incorrect data type, they could be just generally wrong. The second is the invalid format, and those are attributes that are there, but they're not in the format that we need to make them usable. The third is an invalid grouping, and that's where you have multiple attributes in an element that just don't make sense together.

(08:03):

So they are not a valid grouping, mechanically invalid grouping. Now let's talk about the Intrinsic Issues. Now, once again, these are things that indicate that the issues are likely inherent in the original information and not necessarily in the delivery of it. The first subcategory of intrinsic issues is conformity. Conformity has three dimensions that represent coded concepts that are, for lack of a better word, nonconformant. The first is an invalid concept. Now, an invalid concept is a code and a code system that don't appear to match in that you provided a code. But when I go to the authoritative source for that code, it says there's no such code. Another thing that can result in an invalid concept is you gave me a code system and a code and a display, but when I go to the authoritative source, the display you gave me does not match to the display that the authoritative source provided semantically or lexically.

(09:03):

The second dimension is an obsolete concept. And this means that the code you gave me is valid, but it's no longer considered active. And that can be for a number of reasons, but it could also be that it's a bad concept and it was deprecated for a good reason and therefore you're technically not conformant. And the last is the concept is incompatible, and that means that we agreed upon a particular set of codes or a code system, and you did not give me the code system that we agreed upon. Therefore, you are not conformant with the code you provided. So those are the conformant dimensions. The second subcategory of intrinsic issues is plausibility. Plausibility also has three dimensions that represent data that results in implausible scenarios. The first dimension is clinically implausible. Now, this is where the data provided creates a situation that does not make sense from a clinical perspective.

(10:01):

The second dimension is temporally implausible, and this means the data provided creates a situation that does not make sense from a timing perspective. The third dimension is situationally implausible, and this is just a general bucket where the data provided creates a situation that doesn't make sense. So there you have it, an initial taxonomy with two primary categories, Intrinsic Issues and Delivery Issues. Four secondary categories, Availability, Accuracy, Conformity and Plausibility, and twelve dimensions, three under each secondary category. Now, if you're paying attention and a taxonomy buff yourself, you'll also noticed that the design of our taxonomy makes it mutually exclusive, meaning that any category or dimension can only appear in one place. This is on purpose since the alternative would be statistically confusing when put into practice. Now, I can imagine that you're thinking to yourself "Jeez Charlie, the symmetry in this taxonomy is pretty convenient."

(11:03):

I'll admit it that I like the symmetry of this taxonomy, but I didn't force it. I fully expect that over time more categories will emerge, but I really think that these are a good start for the fundamental issues we're dealing with today. And I needed to stop. Perfection is the enemy of good. Now, before I wrap things up, I should point out that the simple assessment modules or SAMs that we'll be using to perform our assessments can be assigned to only one PIQI taxonomy dimension. Considering that a SAM performs a very specific evaluation that uncovers a more granular reason for a given issue, it essentially extends the taxonomy organically as new SAMs are developed, but more on that next time.

(11:53):

Well, hey, I hope you enjoyed today's episode because it did have a tax on-a me. Eeshh that sounded just as bad said out loud as it did in my head. Alright, well, hey, don't hold it against me and come back again for the next episode of the Informonster Podcast where we'll be talking about the simple assessment modules and the elegant simplicity of recursive failure. I'm Charlie Harp and this has been the Informonster Podcast. Thanks a lot.