Streaming Audio: Apache Kafka® & Real-Time Data

Building a Microservices Architecture with Apache Kafka at Nationwide Building Society ft. Rob Jackson

Confluent, original creators of Apache Kafka® Season 1 Episode 142

Nationwide Building Society, a financial institution in the United Kingdom with 137 years of history and over 18,000 employees, relies on Apache Kafka® for their event streaming needs. But how did this come to be? In this episode, Tim Berglund talks with Rob Jackson (Principal Architect, Nationwide) about their Kafka adoption journey as they celebrate two years in production. 

Nationwide chose to adopt Kafka as a central part of their information architecture in order to integrate microservices. You can't have them share a database that's design-time coupling, and maybe you tried having them call each other synchronously. There's a little bit too much runtime coupling, leading to the rise of event-driven reactive microservices as a stable and extensible architecture for the next generation.

Nationwide also chose to use Kafka for the following reasons:

  • To replace their mortgage sales systems from traditional orchestration style to event-driven designs and choreography-based solutions using microservices in Kafka
  • A cost-effective way to scale their mainframe systems with change data capture (CDC)

Rob explains to Tim that now with the adoption of Kafka across other use cases at Nationwide, he no longer needs to ask his team to query their APIs. Kafka has also enabled more choreography-based use cases and the ability to design new applications to create events (pushed into a common/enterprise event hub). Kafka has helped Nationwide eliminate any bottlenecks in the process and speed up production. 

Furthermore, Rob delves into why his team migrated from orchestration to choreography, explaining their differences in depth. When you start building your applications in a choreography-based way, you will find as a byproduct that interesting events are going into Kafka that you didn’t foresee leveraging but that may be useful for the analytics community. In this way, you can truly get the most out of your data. 

EPISODE LINKS

Tim Berglund:
Nationwide Building Society is what most of us laypeople would call a large and venerable bank in the United Kingdom. They're also pretty sophisticated users of Kafka and event streaming in general. Today I talk to Rob Jackson, a principal architect there about choreography versus orchestration, replacement of traditional messaging systems, GDPR, data governance, and more. It's all on today's episode of Streaming Audio, a podcast about Kafka, Confluent, and the cloud.

Tim Berglund:
Hello and welcome to another episode of Streaming Audio. I am as I so often am, your host, Tim Berglund, and I am joined in the virtual studio today by Rob Jackson. Rob is a principal architect at the Nationwide Building Society. And Rob, welcome to the show.

Rob Jackson:
Thank you very much, Tim. Nice to be here.

Tim Berglund:
We've got a lot of cool stuff to talk about today about Nationwide's use of Kafka and really your adoption journey. But first of all, for listeners outside the UK, I know if you're in the UK you know what Nationwide is. If you're in the United States, you think it's an insurance company because there's an insurance company in the US called that. So what is a building society that sounds like people who have cranes and hammers and saws and steal things to me? What's a building society and what is Nationwide?

Rob Jackson:
Okay. So a building society is an organization owned by its members instead of being owned by shareholders. So we have about 16 million members and those are the people we answer to. We offer full banking and financial services in the UK. We've been around a long time, about nearly 180 years, I believe. That's Nationwide Building Society. We have a similar logo to the US Nationwide, so we're often confused with each other in the States.

Tim Berglund:
That's okay. There's a very popular Wiki product made by Atlassian that has a very similar name to Confluent, and sometimes people get confused about that. Oh yeah, you work for them. I use Confluence and Jira. I'm like, "No, that's not what I said."

Rob Jackson:
I may have made that mistake myself a couple of times.

Tim Berglund:
Yeah. It does happen. Anyway. Okay. So building society and anybody who actually knows the legalities of this thing is probably going to chuckle knowingly. But that sounds like what we call in the United States, a credit union, although credit unions tend to be small and regional and you don't get the venerable two-century sort of ages. But they're like banks but member-owned and there are some other regulatory differences a guy like me wouldn't know.

Rob Jackson:
Okay. It sounds similar.

Tim Berglund:
Yeah. To a first-order approximation, a bank.

Rob Jackson:
Yeah.

Tim Berglund:
So you guys use Kafka and I kind of want to just really go through your adoption journey because you and I have talked about this stuff a little bit and there's a lot of cool things that have happened along the way. And I know the architectural concerns that you're thinking about are, in my opinion, precisely the things you should be thinking about, and that always makes me exciting. But talk about how Kafka got started. What were the first use case and pain that you thought Kafka would solve and approximately when was that?

Rob Jackson:
Okay, well, first of all, it's good to hear that you think we are thinking about the right things. That's always a nice vote of confidence. So we've been using Kafka for about... I think we've been in production for about two years now, and we had two separate use cases that started around the same time but by quite different routes. So the first use case we started with was around starting to replace our mortgage sales systems or mortgage origination systems. So we have a system which is... or we had a system in use at the time that had been around a few years. It was a traditional orchestration-style system.

Rob Jackson:
Selling a mortgage is a very long-running process, a very complex process, with lots of handoffs to different people both back-office and different teams. And we wanted to start incrementally replacing it. And when we looked at patterns for replacing it and giving us the agility to change processes, ability to have separate teams working independently, we started to consider event-driven designs and choreography-based solutions. And that was one adoption of Kafka, and very interested in Kafka for its durability, its persistence, scalability. But also there was a real buzz around Kafka and I'd say that the engineers wanted to use it. And so there's a lot of demand from kind of bottom-up, I suppose, or from the engineering, folks to use Kafka.

Rob Jackson:
So that was kind of one use case. The other use case we looked at the time was around, so in the UK we have a regulation called open banking, which is about increasing competition in the banking sector and exposing data through APIs that aggregators could use to kind of aggregate information from multiple banks and put it together. This meant that those aggregators could start hitting our systems through these external APIs and start generating a lot of volumes against our backend systems, and probably scaling that volume up much more quickly than we were used to. So we started to

Tim Berglund:
Was this the regulation that has recently landed on you, the open banking stuff?

Rob Jackson:
That's correct.

Tim Berglund:
Okay.

Rob Jackson:
And when we started thinking about how should we respond to that regulation, we have to expose these APIs, we have regulation in terms of the availability of those APIs. And we looked at different ways to meet that regulation. And it would involve things like scaling some of our mainframe systems, scaling mainframes. Mainframes can clearly scale to huge volumes, but it tends to be quite a... it's a pre-planned process. Tends to be quite costly and we were looking for more cost-effective ways-

Tim Berglund:
Let's just say you were paying by the drink there. So scaling up your mainframe process, like you said, it works, but you're going to pay for it.

Rob Jackson:
Yeah. And it's a scale-up kind of architecture really rather than a scale-out. And it's not really a cloud architecture that's coming, it's maybe an option, but it's not perhaps the most obvious cloud use case.

Tim Berglund:
That is an excellent understatement with which to begin our conversation. It's not obvious how you can run mainframes in the cloud.

Rob Jackson:
Yeah, that's right. So what we looked at as ways to... I think this is liberating our data and I probably have some of my colleagues telling me off for using that word, but liberating the data from our mainframes through technologies like change data capture, that push that data into somewhere else where we can create caches and push that data all the way out to the cloud where we can build APIs in the cloud where we offload that workload into the cloud that's not even in our data centers. So when we were looking at the architecture for that, that's what's interesting, no SQL databases and things like readies caches. You don't always know how to stretch those APIs at the beginning of the project, but you do know that you need high resilience, high availability, high throughput or those sort of things.

Rob Jackson:
And I think I was probably influenced quite heavily by Turning the Database Inside Out from Martin Kleppmann, but looking at change data capture into Kafka, doing stream processing and then creating materialized views off the back of those Kafka topics seem to meet all the requirements around performance, resilience, availability, all those kinds of things that we had to do, but it also enabled us to get digital agility, move that data to the cloud, start producing domain-specific caches and introduce events and notifications into the society. So it's an architecture that captured a lot of imagination and generated a lot of interest internally.

Tim Berglund:
There are a few things there to drill into. I want to back up. I would just like to talk about mainframe applications in the cloud for a little while because that's just so fun in a-

Rob Jackson:
Okay. Probably not the right person to speak to there.

Tim Berglund:
Through the looking glass kind of way. It's a conversation you want to have with a Cheshire Cat or something. But you mentioned migration from orchestration to choreography and we have recently discussed those on the podcast, but for anybody who's just coming on this episode, if you could give us a quick definition of how you see the difference between those, then I want to drill into what was going on in your own system.

Rob Jackson:
Okay. So with our existing architecture maybe... I mean, choreography and orchestration have been around a long time, but I think choreography has increased in popularity since Kafka became more popular. But orchestration is where you tend to have a process controller, something in control of an interim process saying... a customer starts a process, let's say, and the orchestrator would say, call one system, get a response back from that system. And depending on what data's come back, you would then call another system. You'd have to deal with whether that system is available or not. It might have to put that process to sleep if it's a long-running process and you're waiting for a person to do something, but it's basically in control of that process. That's orchestration. Choreography-

Tim Berglund:
The nice thing about that being that you do have the code in one place as it were. You can look at the orchestration code, and hopefully, this has been... like you said, it's not a new idea. And occasionally that "code" is XML and things like that historically. I mean, there's all kinds of perfidy I think that vendors have imposed on us.

Rob Jackson:
And it's interesting, you say that the advantage of that is that the code is all in one place. You could then probably flip that around and say the disadvantage of that is that all the code is in one place.

Tim Berglund:
Exactly. Which gets us to choreography.

Rob Jackson:
Yeah. So choreography would be if you imagine an event being used to trigger the execution of some code, so you'd write something like a microservice that would be a Kafka topic consumer and a particular event has been raised. You run some workload which might be writing to a database, it might be calling another system. It could be waiting for a user to then do something. And when you've done that work, you raise an event. But the interesting thing is you don't know as the raiser of that event, who is going to consume that event. So you haven't got a single thing in control. You've got microservices raising events and consuming events.

Tim Berglund:
Microservices don't have psychology, the people who write them do. And the psychologically difficult thing here is that you have to be willing to do your work and announce it to the world, produce the event to the topic and allow other services to pick that up and go forward. There isn't one place where you see all this happening. I mean, the metaphor, don't lose the metaphor dear listener between orchestration and choreography. And orchestration, there is a conductor who is really calling the shots and keeping the time. And of course, all the musicians they've got music in front of them and skill, and they've got timing too, but it falls apart without that conductor unifying the orchestra in choreography. And if you haven't ever taken any ballroom or Latin dance, I can only recommend it so often. I don't know how to convince you. You really should. Everybody, if you're physically able to, it's just a really good thing to do.

Tim Berglund:
And you find that there's an interplay and there could still be leadership in a dance, but one person makes a decision and the other person is then free to respond to that decision. And that's the same way with choreograph microservices. The service gets an input, does this computation, produces its result to a topic and then some other services decide to do things with that. And you kind of have to let your services be adults. You wouldn't manage people like that. You wouldn't relate to your spouse like that. Like, "Oh, are you going to do that thing that I asked you to do? Is it done yet? Is it done yet?" It'd be terrible. So that's the kind of the hump you have to get over in going from, really I'll say the older orchestration mindset to the newer choreography mindset that you're describing.

Rob Jackson:
And then you can start to think about what advantages do I get from going to choreography. So why would I do that? Because I think it does bring with it some complexity. If you're talking about a process that is complex, you might need to measure service level agreements on it and so on, then you do have some complexity and probably less mature tooling in terms of monitoring the execution of that process as well.

Tim Berglund:
You don't say.

Rob Jackson:
Yeah. So you talked about microservices consuming that topic. And just thinking about that for a second, you said microservices, plural, and it might start off as one consumer of that event might be interested in it, and that's all that you need to get your process up and running. But then somebody else is interested in that event, or you want to add a new step into that process that runs in parallel with other steps. These are the sorts of things that you can add without changing. I guess you're reducing your blast radius of change and your consumer doesn't need to change any of the other microservices. It's just a new consumer of a topic in an event.

Tim Berglund:
Yeah. Let's see, we're talking about choreography and orchestration. There was one other thing I wanted to say. For really complex things, sometimes there are tremendously complex things like Baroque chamber music or something like that, that are accomplished through orchestration. It's not like it doesn't work. But most of the most complex and most useful things that we see in our lives, like say a broadly market economy looks more like choreography. There are a bunch of independent actors making decisions and exchanging information and exchanging value, and stuff happens that you can't... But generally good things happen. Most of us think that you can't predict that nobody's designing, but you get interesting emergent phenomenon. And of course, in enterprise architecture, yes, someone is designing it. There's a Rob Jackson somewhere who's saying, "Hey, we need this business outcome to happen and the services need to get written."

Tim Berglund:
Then there's another architect who says, "Well, this data is here, and other services can sort of grow up in the soil of this thing." So it's like an economy. It's like an ecosystem. A bunch of things, making simple decisions and simple interactions and complex behavior emerging from that.

Rob Jackson:
I mean, sure we'll get onto it later, Tim. But as soon as you start building your applications in this way, you start to find as a by-product that you're getting interesting events going into Kafka that you haven't really foreseen what you'll use those events for later. So suddenly, you might find that the analytics community is suddenly interested in those events. They never saw them before. They're now events that they can consume. So I think you get a lot of byproducts. Perhaps that's when you put choreography and Kafka together, perhaps.

Tim Berglund:
Yeah. And the way I normally see this is there are broadly speaking, two motivations for people to adopt Kafka as a central part of their information architecture. One is to integrate microservices, I'll say. You can't have them share a database that's design time coupling. Maybe you tried having them call each other synchronously. There's a little bit too much runtime coupling there and so people are increasingly landing on event driven reactive microservices as a good and stable and extensible architecture kind of for the next generation. So the one motivation for Kafka is basically I've got microservices and help.

Tim Berglund:
The other one is, things are happening to me and I need to interpret those things and make decisions about those things very quickly. And so the 30-year-old ETL paradigm can't help you. Absolutely doesn't do that thing, and so you've got the real-time analytics motivation. These both meet in the back though. You start doing one and you figure out that, oh, I can do the other. So like you say, you start with microservices but wait, I have all these events. I can do analytics on these. So tell us as your adoption journey continued, did interesting things happen there at Nationwide?

Rob Jackson:
Well, I guess you then put together the other use case I started talking about, which was the mainframe offload. And there what we saw was, you've got database changes going on. We were using stream processing to take those database-level changes and turn them into events which we could then load into a database where we could query them so we could have a very much up-to-date queryable cache of mainframe data. But we started with one mainframe, we've added a second mainframe now, but there is a whole range of data sources that you start to bring into that world. You've got more and more events being derived from those database changes. And databases are not necessarily the best place always to get your events from. It would be very nice if your applications were built in the way we've just been talking about with choreography where you just get events as part of the application build.

Rob Jackson:
But a lot of our legacy applications predate Kafka by, in some cases, 40 years. So getting events out of the back of those using change data capture has been really useful for us first of all to create that database of queryable data for the open banking regulation we talked about. But out to the back of that, you start to get more and more events coming. So the fact that a customer record has changed in a database, that's actually a really interesting event that the customer has changed something about them. What do I do with that? Who's interested in that? What different parts of the organization are interested in that before they would have got to know through maybe some ETL processes, or if it's really important maybe through an orchestration, kind of SOA type architecture where they were told through an API.

Rob Jackson:
But what you start to see is, you start building your applications using event-driven designs. You start to bring change data capture into the estate and you can quickly start to build up a good catalog of events. Now, we're still pretty early on in terms of what else do we use those events for outside of those, what they're actually being built for? So yes, we've got the queryable cash. Yes, we've got the event-driven design for mortgages, but the analytics use cases, pulling all that data together to do say, fraud analytics, those things are only just starting to happen now. But yeah, it's early days for those. What I think of is a snowball effect. Once you get more and more events in there, you get to do more and more interesting things. I'm kind of really excited and waiting for that to happen, and the sooner the better, I guess.

Tim Berglund:
Oh yes. As you say, it's there. So the demand for access to that data and teams of yours doing interesting value-added computation around that data and exposing it, that demand's going to show up.

Rob Jackson:
Definitely. And I think the sooner we get that data from... We tend to have these on-prem Kafka clusters at the moment, and perhaps we're rebuilding silos of information. But where I see this going is we start to either generate those events directly in the cloud or we replicate from on-prem into the cloud. And I think where we start to see ease hopefully what I'm hoping for if we start to break down those silos and start to have easier data sharing across the organization. And I think the cloud is an enabler for that. And of course, you can rebuild data silos in the cloud if you try, but that's not what we're trying to do.

Tim Berglund:
Of course not. I appreciate that aspiration because I share it because I know architecturally how event streaming can break down data silos. It is in fact possible to do, and I have this really clear picture in my head how relational databases as the center of an application, the architectural decision to build applications, not around a log but around a mutable update in place data store. Not that there's a thing wrong with relational databases, but they're triumph beginning in the early '90s and late '80s, early '90s, and the application architectures that emerged around them, giving rise to what we used to call the departmental database. Those were silos.

Tim Berglund:
And kind of the shape of that technology was to have this database of a certain scale that wasn't too small and wasn't too big. And then an application that grew around that, and that was just what naturally you wanted to do. You pick the tool and the tool shapes the kind of work that you do. And I see with event streaming, the bent of the tool, the architectural opinions of that piece of infrastructure are more towards integrating and less towards siloing. But like you said, if you can, you want to break them down. You know there are other drivers in the system, like people, like organizational structure, like turf, and who loves who, and all of these kinds of things that are other reasons we have silos.

Tim Berglund:
You can't just blame databases. We kind of love our silos. I'm super interested because I've got the vision and there are like a few people doing it, but whether widespread adoption of event streaming, I can look back after it, I can look back in 10 years and say, "Oh yeah, the silo thing, those got broken down. Mostly that was good. Or that part of the vision wasn't realized, we have to see"

Rob Jackson:
Yeah. Being a bank we care deeply about security, data protection, or those sorts of things, and you have to balance. So data silos are often done for, I don't know, less good reasons. I created this database therefore it's mine, and I'm not going to let anyone else have any access to it.

Tim Berglund:
Exactly.

Rob Jackson:
But there are also valid reasons for the kind of, I guess, controlling and limiting access to your data. But what I want is the technology not to get in the way. So if we decide something is an interesting event and can be consumed by lots of different parts of the organization, then let's make it really easy for them to do it with the right controls in place.

Tim Berglund:
Right. Regulatory silos are fine. That's your government telling you what to do. And as I say, they have guns. And so yeah, we do that. But kind of the social silos, we'd like to not encourage those. I just like to go down that path for a little bit. Compliance and security, you're one of the most heavily regulated industries there is. And those laws differ from country to country and in the US they differ from state to state, but they're more alike than not. So what do you deal with there? I'd love it if you could talk about GDPR. Again, if you look at sort of the opinions and the worldview of Kafka as the infrastructure it's, I'm going to remember everything forever and never change it. And then there's this law that says, oh, no, you have to be able to forget things. So how does that play together?

Rob Jackson:
So of course we talk about immutable logs remembering everything forever. And as you just said, General Data Protection Regulation or what used to be called Data Protection Act in the UK, a customer, if they stop being a customer of ours, we have to remove their data after a certain amount of time or they have the right to forgetting as well and the right to correct. So this is where we use things like topic compaction. I mean, the easiest thing is you don't store the data long-term in Kafka. And if you don't need to store the data, that's the easiest way to not worry about it at that layer. Clearly, there are other places where you do have to worry about it, like in your databases.

Rob Jackson:
But in our case, we wanted to store that data long-term, we wanted to be able to say that Kafka is going to be a long-term event store.

Rob Jackson:
It may also be that long-term store for change data capture, where we are putting database changes into Kafka and then materializing it in different places, different technologies. So we wanted to store it long-term. So then you immediately run into, well, if we delete a record from our database, then it needs to be gone from Kafka as well. And of course you get the delete event into Kafka, but that doesn't mean that all the subsequent records are gone if you've got long term retention turned on. So this is where you look at things like topic compaction, making sure that your consumers of that topic are processing those topics in a reasonable time and removing it from any caches or you might consider those caches in memory and rebuilding them from time to time anyway. But topic compaction is the primary way in how we meet GDPR regulation.

Rob Jackson:
And then you have things like encryption as well. So you've got things like payment card, industry regulation. So credit card numbers are a good example of data that has to be encrypted at rest and in transit. And this is where you're looking at things like encrypting that you'd need to add on. So in our case we're actually encrypting data before it even goes into Kafka. We're also doing encryption of the storage. And when we go to cloud, we have to do things like bring our own key. We can't rely on cloud provided keys and so on. So there are ways to meet it. It's not always easy, but you can meet the regulation with Kafka and this is kind of architecture. So it's just, you have to think about it upfront.

Tim Berglund:
Right. And that, bring your own key and encrypted data at rest, these are recent additions. And when I say recent, the date we're recording happens to be Friday, November 13th, 2020. Nobody's talking about how scary it is that there's a Friday the 13th in 2020. I'm fine with it, but mid November 2020, it's a recent feature rollout to Confluent Cloud. If I may just chill for a moment, the ability to bring your own key, because all the data at rest has always been encrypted in cloud. But for folks like you who for regulatory reasons can't let some dang cloud vendor pick your key, because who are we? You can pick your key and that works fine.

Tim Berglund:
So this message is brought to you by Confluent Cloud. Thank you for listening. I'm always curious to talk to people who do this, where you've got customer data and topics and it's retained long-term and you need to be able to forget it where you encrypt it with a customer specific key, manage that key elsewhere and destroy the key when you need to forget the customer. Have you found that to be practical?

Rob Jackson:
It's not something we do. So this is the way some technologies work. That way is not an approach we've chosen to follow. So we would tend to use topic compaction for that kind of thing. So we always hold the latest rather than holding a history and throwing away the key.

Tim Berglund:
And the amount of time it takes to trigger a compaction and all that kind of stuff, regulators are satisfied with this. Because I mean, there's a point at which there's a regulator who has to think about the way the infrastructure works and make sure that is likely enough to satisfy the law. But topic compaction works for you guys, I guess is the thing.

Rob Jackson:
Yes. And of course, topic compaction is one aspect to it, but applications are not... the Kafka topics themselves are not being queried by our applications to expose data directly. I'm going to explain that a little bit what I mean by that. So you get a data change coming through into a system of record, like a backend system holding customer data. That data change goes into Kafka and we're processing that within a couple of seconds or less. In that materialized view, you've got the up-to-date data straight away. So topic compaction is about just making sure that the data is cleaned up as in overnight within a couple of hours in the Kafka topics so it's just keeping the size down, and making sure that data is gone when it's needed to be gone from the Kafka topic, it's deleted, versus that query aspect to it. Does that make sense?

Tim Berglund:
No. Bring me through that again.

Rob Jackson:
Okay. So you're kind of saying-

Tim Berglund:
What I meant was, with topic compaction, you can only see the most recent value. The other one is kind of back there on disc at some point. And until that log segment is actually compacted, it'll be there on disc. You can't see it, but it's there.

Rob Jackson:
But topic compaction is one of the things which happens in the background and you can't always say exactly when it's going to happen. So there could be some time before topic compaction takes place.

Tim Berglund:
And I guess my question is, nobody's worried about that, from a regulation stand point?

Rob Jackson:
Well, and that's what I was trying to kind of get to the bottom of really. Let's say you've deleted the record in your master data source. That delete event goes into Kafka straight away. That's going in within a second. And then where we're building materialized views of that Kafka topic, our systems are designed to make sure it's also processed that delete event within a couple of seconds. So any materialized views of the back of that topic are reflecting that change almost instantaneously, regardless of the compaction going on in the background.

Tim Berglund:
Got it. The fact that there's compaction and there's that value in a log segment somewhere on a disc, it's invisible to the application, it's technically formally not possible to read.

Rob Jackson:
That's right. And then it's gone from the Kafka topic as soon as compaction takes place as well.

Tim Berglund:
Good enough. As you've gone through your adoption journey, what is the discussion been about Kafka versus traditional messaging systems? Have you had to make the argument of why it's different? Have they been complimentary? Do you see Kafka displacing messaging systems? If I could ask you, begin with your account of how Kafka and traditional messaging systems are different.

Rob Jackson:
Let's think. I sometimes explain Kafka to colleagues as you know how a traditional messaging system works. You put a message onto a queue, and that could be any number of different queue technologies and someone else will read that message off the queue. And then typically it's removed from that queue.

Tim Berglund:
Right.

Rob Jackson:
Also, it may well be persisted to disk, that message, for resilience purposes in case of kind of infrastructure outage. But that's not the primary design choice of that queue. That's the kind of, how do I make it resilient? So we kind of know that's how queues work. And then the way of looking at Kafka is you can say, well, it's like that, but it's designed to store data. So you can think of Kafka as a distributed data store with a messaging API on top of it that allows you to interact with that data through the ways we're used to interacting with a message queue. So that's one way that sometimes works with people.

Rob Jackson:
Of course, there are times when queues are still appropriate. So queues are still useful for some use cases and might even be simpler to using than Kafka. And we get into debates sometimes about whether I want to just get a message from A to B within my line of business. I'm sending commands. These are not events. No one else is interested in it and I'm familiar with this technology. Can I still use it? And then yeah, there are still valid reasons to use queues. So I don't think queues are necessarily going away, but we're seeing perhaps in fewer use cases for traditional queues and more use cases for Kafka. And I think it's because people realize they actually want Kafka for lots of reasons. And if we're using Kafka anyway, and it can do the messaging use cases, the queue use cases, well, let's just standardize on that as a piece of technology.

Tim Berglund:
Right. Which you almost always can. There's occasionally somebody will say, "No, I need a message to be destroyed as soon as I read it." And they'll convince themselves that that's a hard requirement and maybe be right. And so you do have this place for queues, but I see the same thing where the Kafka is a superset, not a strict superset. Like I said, there are some little bits of queue functionality that are just outside of Kafka, but broadly it's a superset. It does some other nice things and feels competitive just in the architectural sense, like it's a substitute.

Rob Jackson:
Mm-hmm (affirmative).

Tim Berglund:
Oh, go ahead.

Rob Jackson:
I was just to say that the big difference I think for people is, when you realize that Kafka is designed to store large amounts of data and long-term, and then you start to realize, well, that's interesting because then I can add multiple consumers, add consumers later. It doesn't sound like a big change from a queue. And I know there are other differences between Kafka and queues, but to me that's one of the main ones. And it's the use cases that brings in that I think is really interesting with Kafka.

Tim Berglund:
Yeah. And I say, I think fairly frequently on this podcast, it feels like I say it frequently, maybe I say it to myself, but the people who call Kafka topics Kafka queues, I correct them politely but firmly. I get it, there's sort of queuing semantics kind of over in the corner winking at you, but queues aren't persistent, logs are, and that's always how I give that definition too.

Rob Jackson:
I haven't made that mistake today. Have I, Tim?

Tim Berglund:
No, Sir. You have not. No, I actually would have covered that. I would've made a note in the notes for the editor to edit that out, but I haven't met any of those yet.

Rob Jackson:
I'm sure I'll do it later.

Tim Berglund:
Right. You've touched on this a few times and you just kind of were, but the difference between Kafka and databases, I personally see that as a more interesting question and one that implies more coexistence into the future. But what do you see, and how do you answer the, is Kafka a database question?

Rob Jackson:
I don't know if I'm allowed to do this. I think I could say there's a really good presentation I saw the other day on something like, is Kafka more ACID than your database? And who's the presenter again, Tim?

Tim Berglund:
He was this American guy. He's got a beard, kind of a little gray in it. Tim, it was me.

Rob Jackson:
I think that's a good presentation on the comparing Kafka and databases.

Tim Berglund:
Thank you.

Rob Jackson:
And I think the... for us anyway, just the confusion arises as you start to explain the differences between Kafka and queues, and you say, well, Kafka is this distributed data system that you can interact with through a messaging API. And then you start to talk about where you've got things like Interactive Queries and you've got RocksDB embedded in there so you can do key-value look-up and streams. APIs require state, and you can use key-value look-ups there. Then you start to get the database people interested. And I guess the database people are interested anyway. Because as soon as you start talking about storing data long-term then you get, well, is it a database then? I know you can answer this better than I can Tim, but it's certainly you've got the ACID semantics and its long-term durable, persistence is there, but what you don't get to is the rich query capabilities you get with the database. So yes, you can do some simple stuff with Interactive Queries, but there's a whole bunch of use cases you still need databases for.

Tim Berglund:
Yeah. With Interactive Queries, you can build a little hash tables, you can build key value stores out of things, which is super useful. When we say database, that's not what we mean. I mean, 10 years ago, we went through this exercise of trying to reprogram ourselves to have a broader definition of what database meant. In 2005, if you said database, it just meant relational database. And there wasn't anything and the old timers would say, "Oh yeah, I used an object database once and it didn't go anywhere." And a few guys in the corner are like, "Well, XML." But database meant relational database, then NoSQL happened and we all realized, okay, it might not be relational.

Tim Berglund:
But most of those still gave us rich query capabilities and we just expect that. So my answer is that Kafka is not a database, it's more of a database construction toolkit, and there are layers that are emerging on top of Kafka like CASE SQL DB that aspire to much more database like functionality, but they're different things. And you're just not going to find relational databases going away. In one of your primary use cases, you described the mainframe offload. You have mainframe, you change data capture into topics, and maybe you've got applications that grow up around those topics and process them directly, but probably then you just Kafka connect them into a relational database and you do queries on that.

Tim Berglund:
Now, if you need to do interesting and rich queries on that data, that's going to be a relational database forever. Where I see the interesting boundary is, when you've got what I'll call the degenerate use of a relational table, where it is just a lookup on a single primary key, it's a key value store. And that happens. It's not bad. It's not degenerate in the bad sense, it's degenerate in the computer science sense, not in the ethical sense, where there's a lot more stuff you could be doing but you're not. Those cases, I see things like K tables and Interactive Queries and CASE SQL DB, current state of CASE SQL DB tables and pull queries on those. All that is competitive with relational tables. That's the margin where there's competition. But I see Kafka as being a toolkit for constructing big, giant distributed databases, AKA enterprise applications built out of microservices, each of which has one or more relational databases in it, doing all the glorious things, relational databases do.

Rob Jackson:
You did touch on something there which is probably obvious to you and any listeners, but it might be worth just saying that in that use case where you're taking change data capture into topics, and you're putting it into, I think you said databases. And I think that's a really important thing that you may materialize that into relational database for some workloads. You might put it into a document and store for some web applications. You might put it into a graph database if you want to do some kind of graph workloads on it. You might also put that into Hadoop, into a HDFS File System if you want to do analytics on it. And because I think the NoSQL movement really is that not everything is relational database and getting it into Kafka where you can then take it into other places for different workloads, I think is, for us anyway, is an important part of the architecture and an important benefit.

Tim Berglund:
Absolutely. And that's kind of the lightweight definition of CQRS there is that I have this stuff and I have different ways I want to be able to read it. And so go ahead and write it into those different systems. I'll see if I can find a video version of that talk to link to since you mentioned it, and hey, I guess I'll never pass up an opportunity for self promotion, will I?

Rob Jackson:
I've got a link if you need it.

Tim Berglund:
Okay. If you could send me the link, that'd be great. But I talk about, and this is originally Martin Kleppmann solution, but ways to do that multiple writes into multiple syncs because you've got different ways you want to read the data, like the list you just enumerated. There are ways to do that atomically with Kafka, that relieve the application of a lot of responsibility that would otherwise be its own.

Tim Berglund:
Final question, we talked about GDPR, which is a regulatory constraint that does some interesting things queue. How about just broadly data governance and data lineage. These are things that are strict requirements for organizations like yours and often involve you having to do a certain amount of legwork to get that done. So what's your story been there?

Rob Jackson:
I would say that my data governance colleagues and data architects are used to... yes, we do have regulator requirements around understanding our data. Clearly it's a good thing to understand what data you have and the lineage of it. That's typically been for our relational databases. So when we started introducing Kafka, the first question from my data architecture colleagues is well, how do we understand what events we have in the society? And can we start putting metadata alongside them? So they're really interested in understanding what events do they contain PCI data? Do they contain customer data? Who's the business owner of that data? And understanding that, and I guess it's data they didn't have before but... Sorry, I'm using the data word too often here. But these were events they didn't have before they happened in our systems, but they didn't get to see them. We didn't get to record them. So now we record them. They are really interested in them.

Rob Jackson:
Alongside that, people building applications are really interested in what events are out there, what can I start consuming and how can I start using them? So tools to understand events, being able to query across clusters, whether that's on-prem or in the cloud. Those are really interesting to us, but it's early days for us on the uses. So the topic it has too many words.

Tim Berglund:
I know, right. Data topic.

Rob Jackson:
And so it's of great interest to us. We're not there yet. It's kind of today the tools are not what we want and we are kind of using things like maybe spreadsheets and existing data governance tools as we can. But it's an era we watch with great interest.

Tim Berglund:
Awesome. So it sounds like headers, just from a technological standpoint. Is it a lot of metadata in headers? Is that the answer there?

Rob Jackson:
Yes, it could be. So clearly you've got your Schema Registry and you've got your Schemas, but being able to add tags, that might be something. But having control over the tags instead of it just being kind of a put whatever data you like in the headers or tags and tag them however you want to. It might be that we want to put some control over that in terms of there's a pre-canned list of things that you can tag these events with.

Tim Berglund:
What's next for Kafka at Nationwide and for you? What are you looking to?

Rob Jackson:
As I said, we've got a couple of use cases live. We've got a number of in-flight projects that are following similar patterns that we've already talked about. So we're building more event-based systems or choreography patterns, and we're still working through the details of when to use orchestration, when to use choreography. Sometimes you might find a bit of both in the system, but it's working our way through those details. The mainframe offload, it's not really just a mainframe offload it's, we've got a whole bunch of data silos, we've got a whole bunch of legacy systems and we want to modernize them, we want to take them to the cloud. How do we enable that? So more and more data sources into that architecture where we use things like CDC, or even getting applications to emit events natively. So you kind of see more of the same that we've already been talking about. So at the very least we're doing those things.

Rob Jackson:
There's a lot of interest from my data scientist colleagues, data analytics, in terms of, well, how do I make use of those events? How do I start running analytical jobs off the back of them, off the back of events and much closer to real time analytics? And then how do we get that information back into the channel applications? So where we work something interesting out about a customer, some behavior or recommendation we can make to them, how do we get that back to the customer? And so then you start thinking about notifications and Kafka obviously has a big part to play in that. So it's more choreography, more event driven designs, getting more of our data sources into Kafka and materialized. And then what interesting analytics can we do? And what interesting push notifications can we give to our customers to tell them interesting things? Instead of relying on our customers querying our systems, logging in and checking their balance, how much can we push to the customers and tell them that they want to hear about.

Tim Berglund:
My guest today has been Rob Jackson. Rob, thanks for being a part of Streaming Audio.

Rob Jackson:

Thank you very much.

Tim Berglund:
Hey, you know what you get for listening till the end? Some free Confluent Cloud. Use the promo code 60PDCAST, that's 6-0-P-D-C-A-S-T, to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31st, 2021, and use it within 90 days after activation, and any unused promo value on the expiration date will be forfeit and there are limited number of codes available so don't miss out.

Tim Berglund:
Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on a YouTube video or reach out in our community Slack. There's a Slack sign-up link in the show notes if you'd like to join. And while you're at it, please subscribe to our YouTube channel and to this podcast wherever fine podcasts are sold. And if you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. So thanks for your support and we'll see you next time.