Fraud Technology Podcast

Episode 4: Balancing Data Flood: A Prudent Approach to Fraud Prevention

Mohammed Topiwalla Season 1 Episode 4

In this episode, we explore the delicate balance between collecting more data to catch fraudsters and being responsible stewards of that data. We ask crucial questions: How valuable is this data, both in the short and long term? Do we truly understand its source and potential benefits? And importantly, what's the cost, and who bears it? Our discussion delves into the decision-making process of whether to acquire external data sets or develop in-house solutions. While external data sets can be beneficial when used judiciously, we also ponder whether the future of fraud prevention lies more in collaborative data sharing among financial institutions. Join us as we navigate the complex terrain of data in the fight against fraud.


Thanks to Mohammed Topiwalla from Wise (https://wise.com/) for the insights. In conversation with Ravi Madavaram from Regulo (https://www.regulo.ai/)

Welcome back listeners. This is Ravi from Fraud Technology and today we have the privilege of hosting Mohammed Topiwala, a seasoned data scientist in the fraud domain. He started as a data analyst and is currently a fraud data scientist, a coveted job that most of us would love to have. He's also been working with VICE for the last four years based out of London. Again, that's also something that I envy about him. Join us as he shares his expertise and insights on the complexities of implementing fraud models. In large financial institutions and today's podcast, especially will delve deeper into the technical fraud development and deployment. And with that, let me welcome to this podcast, Mohammed Topevala. Welcome, Mohammed. Hi, thanks for that really incredible introduction, Ravi. It's so great to be here. So the first thing that I want to start with is one of the things that I always think when it comes to fraud or most of the technology solutions, right, that generate alerts is the Problem of false positives, right? And I want to understand one, if the false positives is a problem, and is there a technical way to solve this false positive problem itself? Yeah, of course. So false positives actually in any risk prevention field can be extremely expensive. And the reason being is whenever an alert is generated by any form of detection or preventative mechanism, There needs to be someone at the opposite end that manually reviews it and takes an end action on it. Most of the geographies that risk institutions operate in, they have a requirement that this final decision cannot be automated. So that means there needs to be a human being at the end of it who's reviewing it. And we have limited human beings. So that's the reason why it's really expensive to have false positives. So if you look at false positives as a whole from any risk prevention institution, I think there are two ways that we should always look at it. We should always think about it in the perspective of short term as well as in the perspective of long term. So in perspective of short term, what's our goal? We just want to quickly plug the gap without completely breaking the number of people who are actually sitting and reviewing these things. We don't want them to work overnight. So what do we do? We employ some short term strategies. And these could be something as simple as creating a whitelist. We all have heard about people creating blacklists, where we put the riskier data points inside, which we immediately exclude. But a whitelist could be something where you, you analyze and you put trusted data points over there. So if anyone crosses those trusted data points, you just let it go through and not give it the chance to actually become. Something to be reviewed by human being at the end. Another thing that we can always do is we can check where exactly is the model performing bad. So it's quite possible, at least in the financial universe, that the models perform really good where, let's assume the invoice value is really high. Let's assume 10, 000 pounds. It's quite possible the models catch that fraud really well, but the moment the transaction amount is below 10, 000 pounds, the models might actually not be doing well at all and might be generating a lot of false positives. So we can create something known as a sub route. A sub rule is nothing but taking the action on our model score in a slightly different way. So, for example, the slightly different component over here being the condition of the invoice value. So in case the invoice value is higher, you penalize it. And if the invoice value is lower, then you sort of let it go because you know you have higher false positives. And I think this combination of strategies could be really good in the short term to plug the gap. But then of course, once we've plugged it in the short term, we should always think about, alright, how can we prevent it from happening again? Which is quite difficult, but it's worth giving a shot. And I think on the long term side, there are two things. Again, the first question is, do we have a massive gap between the model training environment? And the real world where the model is actually deployed, where we are actually testing the model. Because the funny thing is that in any risk prevention field, you have very few bad actors, which is exactly what you want. And you have a lot of good people, which is again, exactly what you want. But it's really bad from a data science perspective, because the representation of these bad customers is really low. So quite a few of us employ these different strategies, right? We sometimes oversample the bad guys. And sometimes we downsample the good guys. And sometimes while doing this, we actually forget to check because the model is trained on this artificial sort of data set. But in the reality, it's seen some other form of data. So are we doing this correct testing? Okay. That's really wonderful. I mean, this is some, some of the problems that I've known before. I do want to have understand some of these things. It's fascinating the way that you talked about. model training and oversampling and some of the issues that that brings, right? I do want to get into that. But before that, you mentioned something about the regulators requiring the alert to be cleared by a human. There, one of the things that you can always do and what does even an alert mean, right? So you can always set the threshold in such a way that you have less alerts. That way you have less number of people that are needed, right? And so that's, that's a control that we have. So now what does even alert mean? I'm sure the regulators are not saying this is the threshold that you need to set, right? They only say that alert needs to be cleared by a human. Why wouldn't we just set the threshold so high that we have very few alerts? Yeah, that's a great question. Actually, that's the best way to game the system, right? Game the system while stopping the process. In a very simple word, like if we try to read these regulations thoroughly, they are actually set for the good of our customers. And what exactly are they trying to achieve? is not to ensure that you review, let's assume 10, 000 alerts in a day, but it is rather to see whether you are able to stay under the risk appetite, which is set by the regulators. Are you able to ensure that the good customers on your platform and across the financial institution are not being affected by these harmful customers? So when we are trying to set a boundary for where to create these alerts and where to not create it, I think what we need to take into consideration is how many of these bad guys do we need to stop? And at what cost does it come to our good customers? And at what cost does it come to, at the end, the agents that are actually manually reviewing these backlogs? And if you're able to... Prove it, which we always are able to in a proper way that we are properly monitoring the amount of risk that we are stopping. It's under the appetite of what the regulators are defining. It's under the appetite of the partners that we work with. I think that's the sweet spot that we want to be in. Got it. Got it. There is again a little bit of lag. I'll cut it out. But so that's really fascinating. So basically what you're saying is. Whatever risk management, uh, program that we have, we should be able to achieve those numbers and the thresholds and everything are a function of how to achieve the objectives that are set in the risk management, uh, perfect sense, right? So the other part of the initial conversation that you talked about is the models itself and how over sampling is done, right? Now, I do want to make it a little more complicated by saying you talked about a threshold, like 10, 000 pounds threshold. One is the amount itself. You also have geography. You also have. Product, right? You may have other parameters that are coming into the picture. And when you slice and dice your type of transactions, the sample of fraud that you have to train goes down further and further. And so your ability to even train models becomes challenging. So how do you go about deciding at what level of aggregation are you going to do a model? So it's possible that you do a generic model that covers all products, all geographies, and all amounts. And various other parameters. Or you could say, okay, this parameter is important to me. Transaction amount is probably a good indicator that if you lose Q, that directly correlates to the loss that you make. So that probably is one way to make models. So how do you decide, okay, I'm going to do these many models. Or aggregation is enough. Because the more you do, your problems of sampling come up. When you do aggregation, your problems are false positives. How do you go about deciding that? Yeah, and the answer to this is, again, a two pointer answer, maybe. So, the first thing is, actually, we need to understand what kind of a product it is, and in what geographies do we wish to operate. So, if we understand that, then we can essentially understand the regulations that are set by the regulators in that specific geography. Do they have any specific requirement, first of all, about checking our customers risk in that particular region, because it could differ region to region. And sometimes the answer is very obviously stated just by reading these different regulations. And then the second thing that is always try to start generic, have just one thing. And if it is not really working, then further try to think about it as one model, but multiple subgroups. And if you can't manage it with one model, then essentially figure out the places where your model is differing the most, where it's not able to perform the most. Or maybe just take a look at your data and see at what point is it not really unifying and then on the basis of that decide, all right, now maybe I need two models or three models. Or maybe it is because the product offering that you're doing, you're changing it a bit across the different groups that you're offering. And that could just be on its own, a particular indicator of if you need to have more models. And because we are talking about different geographies, something that could also be really important. Is there any financial vulnerability within that geography that someone could expose, which is extremely unique to that geography. So just as an example, in the U S you have direct habits as a system and in India, you have UPI, they're completely different payment methods. And I believe each of them has its own financial vulnerability. So trying to have a one solution that fits it all would really not be a good idea. And in this case, you might have to just logically thinking, create two models. Okay. Understand. So just to summarize, what you're saying, if I understood correctly, is that one is you can do a model, you can do one single model and have rules. Let's say for example, you have one model and let's say you have three products and you say, Hey, this product is more risky. So my threshold is going to be lower so that I can check more and more alerts. This product is only a hundred dollar transaction. The risk for me is probably lower. So I'm going to set the threshold very high. So there's a number of errors. So same model, but you're playing based on the rules that you're setting. Exactly. And the example that you're talking about the U. S. and India, because the ecosystem is so very different. Sometimes the model may not even have a meaning because the payment rails are very, very different. Okay, perfect. So the other thing that you touched upon, and I do want to understand a little bit more, is you talked about regulations defining fraud requirements, right? So can I understand a little bit more about, because I've not seen regulations surrounding fraud, and I would love to know where it is even covered. Not the specifics of it, but let's say what kind of regulations talk about fraud itself. Yeah, that's a great question. So, for example, in the UK, I know that just recently, there is a new sort of document which is being pushed by the regulators, which is known as customer duty, and it's mainly highlighting what duty do these different organizations, different financial institutions have towards their customers. And it doesn't specifically talk about fraud, but it specifically highlights take That under what parameters can we actually do different risk prevention checks for our customers? How should we be treating our customers fairly? Since that no individual person is sort of given a different treatment. How do we ensure that we are being fair? We are being transparent. We are giving them absolutely the right information. At all the different points of time. So I think like, it's not like one single document and it's not extremely black and white, but it's a bunch of these reference guidelines that are present that sort of combine the picture together. And we need to then make the call and at the end of the day, be answerable first of all, to our customers, because at the end of the day, that's our end consumers. And then the geography that we are operating in. Okay, I understand. So the other thing that we talked about was the number of models that you do. Can I understand in a very large financial institution, how many models typically are we dealing with? You've been a data scientist, right? I'm assuming that in a particular team, there are more data scientists. I just wanted to get a quantum of, it's like 10 models, 100 models or 1000 models. How many models are we typically dealing with? In any financial institution, I think the number of models can quite quickly escalate. But before escalating that number, I think it's important to know how are you going to maintain so many models. So asking for a very generic number, I think it would be somewhere between anywhere between 10 to 50, depending on the number of offerings. And then essentially the bigger the customer base, the bigger the geography, I think it would easily escalate into maybe even hundreds. Okay. Okay. The reason I wanted to understand is because I know credit risk, for example. When you have, let's say, mortgage products that number around 100 to 100, typically you have credit risk model per product, right? And so it can end up really a messy affair to manage these models. While it sounds sexy to have more models, the nitty gritties of actually doing it day in and day out can get really complicated, right? So I wanted to delve a little deeper on that side, like, how complex is it from an engineering side? So one is developing the model and training it. Now you deploy it and you have to maintain it, you need to monitor it. So what kind of ecosystem do we need to be able to even achieve, let's say even a 10 model fraud ecosystem as a data scientist that you are managing? Yeah, so I luckily work at VICE is a strong believer in automation. So basically I cannot imagine myself being in an environment where the model, the production model infrastructure. At least it's not automated. So if you think about a model to go to production, what are the steps required before it, right? We need to gather data for the model. We need to gather labels for the model. We need to train it. We need to test it. We need to do some form of validation, some approval, some documentation. Um, and then it goes to production and then just after production, there's another step, which is monitoring and governance to ensure that it's working exactly the way you expected it to work. Basically in wise, thanks to computational environments that we're able to achieve automation tools like airflow that is there out there, we can essentially orchestrate the data gathering. The label gathering the massive processing of the data to turn into a training data set, and then using Amazon SageMaker to actually just pick up this training data, train a model, have the model trained and ready, and then a very simple pie spark script that runs on E M R, does all the validations. We need law of all of those outputs directly to ML flow. And then once you have that, that's already taken care of. Most of it. The only thing that's pending is a human being to go there, see how the model is doing, write down a nice little document, create a nice GitHub pull request, and that's it. The model is ready to go into production. So I think for any financial institution, at least that is coming up, which is slightly like a startup. I think it's quite important to set up these foundations if they know, especially that the number of models are increasing. And for the older financial institutions, I think this is the mentality that hopefully they will get into eventually, because someone is going to get very frustrated at the end of the day. Yeah, it's funny that you talk about SageMaker because my CTO, our CTO, he actually comes from AWS and he's an expert at SageMaker itself. And he keeps talking about SageMaker. Now, so you talked about the ecosystem of training and getting it ready for production, right? There's also a sort of data pipeline preparation. So while you're preparing the data for training, it's a one time thing that you're doing. You're not building a real time pipeline, so to say, right, but for production, you need to have a real time pipeline of data that is coming into you, right? So how do you go about? So it sounds like a little more complicated than this is just training. You can do a one time thing. What do you do for a production level? You probably need heavy engineering expertise as well. So for the training, actually, it's not one time, it's a back job, which happens for us. So luckily we don't have to actually go back every week and figure out how to get the data once. But with regards to production, you're right. You need the data actually in fractions of seconds, because that's the speed of how a transfer is actually going out. So it's a massive engineering challenge. And the challenge is each of these models require hundreds and maybe even thousands of data points, if you have way too many models. So the challenge is you need to gather all of this data in fractions of seconds, but the even bigger challenge is not just gathered, not just creating a pipeline to get this data. But the challenge is creating this pipeline extremely quick to get this data in production. So what you're saying is one is a timeline to train the data and get it to production ready. But there's also the engineering side of things that to get the pipeline ready, so that when you deploy, the data is coming in real time. Okay, cool. So the other thing that I wanted to understand a little bit more is, one is you're training the models. You talked about oversampling and I have seen, I've worked in the telecom side of things. And most of the time, you know, we do models and it shows that it's actually a good customer because we only have a small slice of what customer's behavior is. And even sometimes, right, even within our own ecosystem, we have other data, but just that it's not real time. And so we don't use it for the model itself. Right. And we used to have problems of, you know, I think it's a good customer, but in actual reality. So how do models? Knowing fully well that maybe what you're seeing about the customer is only a small slice of their behavior. Yes, excellent question. I think. It's a risk that sort of pops up in the environment that we are operating in because it's just physically impossible to train a model on all your data set. Otherwise, I think we would have to call the help of Amazon and probably ask them to spawn a multi million dollar machine. So I think it depends again on the use case that we are trying to do. So in this case, when we are talking about models, I think the key factor that we need to remember is Let's not make a decision on the outcome of just one model. So there are different attributes of our customers. We could have a specific anomaly detection model just to look at any form of anomalous behavior, which could probably feed in a risk score to the main model that makes a decision. There are different attributes of a customers that can feed in probably attributes like the geography that they are in sometimes even each could be an important help in making these decisions. So it's a bunch of these data points that we need to take into consideration together. Before giving any form of prediction result out, I think that's the key factor that lives in and because we in the risk prevention teams, I think we mainly focus on models that catch the bad guys. But we can also think in the inverse manner. We could also have models which try to uplift the goodness score of a customer, like why not have a goodness score and then essentially make use of both these scores together and then find out the right balance between the two. So I'm assuming you will have like multiple models and you have a scorecard at the end of it and you have weightages for it and then. You calculate a minus one. Okay. Okay. Okay. That makes sense. I mean, we used to do that also in the telecom world. I'm guessing that that's also the same practice here. Do you also use external data sets? How do you, how helpful are they? I know there's quite a bit of data out there, right? How do you decide which data sets to use? Because everything, I mean, sometimes it may feel like more data is always good, right? How do you draw the line? How do you decide what data set is good and what data set is probably not worth it? I might be saying this a lot, but it actually depends again on the use case quite a bit. But in the race to catch fraudsters, I think it's very difficult to say no to more data coming in. But at the end of the day, I think we need to hold ourselves responsible. And we need to ask ourselves the question, how beneficial is this data really in the short term and the long term? And more importantly, do we fully understand what this data is and where is it coming from? Can we explain the source where the provider is gathering this data from that we are actually making use of? And if yes, and we can sort of understand the benefits that could derive out of using the data, then we also need to ask questions about the cost because today. It could be slightly reasonable when the longer term, it could maybe not be reasonable. And at the end of the day, your customers would be paying the price for that expensive data. So is it something that you could probably build in house? Or is it something really that you just can't have? And it makes sense to get it externally. So yes, external data sets do help if we understand their benefits, if we understand our use case, and if we know that we are using it for a particular purpose. I think external data sets do help, uh, different places, but at the end of the day, I think the thing that will help most of the financial institutions going ahead will not be external data sets. It'll just be sharing a lot of the data, which I think has already opened up amongst geographies. I think like sharing of data amongst financial institutions will be the biggest bet. Okay. Okay. I understand. I understand. So this is about external data sets going into, I say, input to your model, right? So sometimes when you're launching in a new country, sometimes you don't have enough label, good bad labels, right, for you to even train data, right? How do you go about developing a model when you actually don't have enough labels or don't have any labels actually? Again, there are two ways to look at it. It depends again, right? I think if you, if you're operating in a market which is slightly close to that, maybe you can just reuse your existing model over there. Let it stay for three months and look at its performance and then probably just. Split its own data set and start retraining the bottle over there. But if you're completely blind to a new market and you need to start with the risk strategy, then you're completely dependent on these external providers for data, for labels. So again, then a cost benefit analysis sort of comes in. Should you just hire a vendor who can do these checks for you who are already well established in the region, or would you want to go ahead, invest in the data because investing in the data also means. And understanding where you're getting the data from and whether you are going to generate similar data that you are going to buy from these one off vendors. Okay. Okay. Got it. Got it. And once let's say you've trained and you deployed, how frequently or how do you even monitor how the model is performing? Because the models can deteriorate over time and the data sources that are coming in the pipeline that is coming in. The meaning of those data also may be slightly changing, not change drastically, but slowly may change itself. Right? So how do you monitor and how do you, how frequently would one have to look at? And measure the performance. Is that like, is it like a continuous monitoring or is it like a timely, time based thing that you do? Yeah, I think what you're referring to features changing is like a little bit like feature drift. So like features likely changing over time. That's a term that I've not heard before, but that's, that sounds sexy. Yeah. Feature drift. Oh, that's okay. With regards to monitoring. Yes, you have your real time monitoring just to see that the model is not massively misbehaving and production, which could be, is it? Is it generating more alerts than you would expect it to, or is it under generating alerts? So it's always good to have this in the production environment with your on call alert set on it so they can take an action immediately if something has gone off in production. But yes, on the longer run, you do want to monitor the model's historical performance. How has it done? How has it operated for weeks or period of time? Does it still have the same AUCPR that you saw it while you were testing the model? But one very good way to understand how good your data is and how quickly a model degrades. is maybe we can just go six months behind in time, train a model on all of that data, and then on a week on week basis, we can just plot the AUCPR. And if you see, like, this graceful degradation of the model, you know you've trained, like, an amazing model, which is very well, gracefully degrading over time. And then if you see something like this, you know that the model probably must have overfitted or something, and, like, it doesn't have graceful degradation across time. So if you notice something like that, first of all, we need to fix whatever is wrong at training, but it's always like a very good indicator of how much time do you actually need to ensure that like, all right, two weeks or three weeks, or maybe one week, or maybe even one day is the sweet spot for me to refresh my model. Okay. This got me a little bit curious, right? So I remember when COVID first happened in March or April of 2020, right? I was actually doing on the credit side of things. And we had a lot of credit models and suddenly the behavior had changed so substantially and so drastically within a few weeks that our models were like going bonkers because the input data was changing so much. So, uh, I don't know if you have you experienced it. Is there any story that you can share with us about that? Yes, that's actually a really, really common phenomena. I think that happened to quite a few of the financial companies back then. So, yeah, something similar again during the COVID period. It's actually forced a lot of people to start their presence online. People who never banked online actually came online. So there was this massive bump up of volume. So I think at that period of time, like you specified, the models were going bonkers, but actually it was just the volume. And the data sort of changing massively throughout time. So what we noticed was there was rise in crime because a lot of people also lost their jobs. And whenever there's turbulence in economy, crime is on the rise. So there was a rise in crime. At the same time, there was rise in actually good customers constantly joining. So pretty much for us, luckily, because the trend was the same between the good to the fraud, all we needed to do was increase the frequency of a model refreshes. And that was pretty much visible through monitoring how much our features are drifting across time. So, for example, for a particular feature value, has the average changed by X percentage? Maybe this average changing by X percentage is our indicator. Alright, it's time to refresh the model. Although we used to wait for two weeks, but maybe now we need to actually do it right now. And that is a good indicator. Or maybe are we generating way too many alerts? for our maybe it's again time to refresh our models and maybe re adjust the thresholds a bit. Got it, got it, got it. And in your personal job, let's say, how do you split between, how much of your time is spent on? Model development versus model deployment and model monitoring, right? How do you split your time? And let's say I'm assuming there's also other tasks that you do. So how do you split your time in terms of percentage? So I would say in terms of model governance is where I spend around 30% of my time. I spend another 40% in model development and only 30% in model deployment. And the reason why model deployment takes time is not because there's an engineering challenge. But only because we're being extremely thorough about the documentation process revolving around the model deployments. So it's basically one third, one third, one third. And when you say model governance, you're talking about? Monitoring. Monitoring, yes. Okay, okay. Perfect. And you also see that, you know, I'm going to change tack here. I also see, you know, with all the kind of expertise that is there in fraud, right? And I still see, and it's probably well noted by everyone also, is that the fraud is growing quite rapidly. What do you think are the core reasons of why... Fraud is growing in spite of having so much technology, so many people focused on that particular problem. It's just, in my opinion, an increase of people that never used online services. There's this massive surge since COVID of people started using online services. For example, my father, just, just a very random example. He's never paid a single electricity bill by any other payment method besides checks. And then during COVID, he was forced to use online applications to pay it. And this just shows that, like, it's, it's a very random example, but there must be a lot of people like that who actually were forced to start making use of these online tools. And with that, what fraudsters did was they actually just evolved into slightly more sophisticated tactics. They went from what you could sort of dub as stupid fraud into more sophisticated tactics. where you and me could also get very easily fooled. Sort of setting up these different call centers and looking really professional and calling people up, sort of scaring them. I think this is what's led to the growth of fraud. And also fraud has sort of increased like word of mouth. In a particular way, at least that's what my opinion is, is that because people understood how easy it was to make money, this sort of has increased quite a bit. Like also on YouTube, if you go, there's so many of these YouTuber channels who are constantly focusing on these scam call centers. And it's just surprising to understand that there's a building filled with people who are just day in, day out. calling different people and trying to make money off. I understand. I understand. Okay. My final question. What is your biggest frustration when you're fighting for? Sometimes the frustration is, you know, it should happen, but it's not happened. What's your biggest frustration or the thing that you think should happen, but has not happened? Very difficult question. So with regards to working in any risk prevention team, I feel like my biggest frustration comes from the aspect of sometimes not listening. very keenly to what my peers in the team are saying. So what do I mean by that? So essentially the people that review the alerts in a way, they're the first line of defense. So they are, they are actually looking at what fraud looks like. So I think it's a very important learning factor for not just for a single individual, but for all of us as a whole that are working the risk prevention team to actually listen to these people. to essentially see what they believe are the biggest risk indicators. So if they have reviewed something that probably the machine learning models could not catch up on, it's quite important to sit down with them and ask them like, Hey, what are the risk indicators over here? And then take their learnings and try to use that. To plug it back into the model, it could be a feature, it could be a static rule, it could be anything. It's very important to maintain this bridge and I think at the initial phase of my career, that's what I took time to understand and that's what, that was my frustration. And essentially, uh, knowing now that this communication always has to be there has made my job so much more exciting, so much more collaborative and so much more impactful. Okay. Thank you so much, Mohamed. That really was really... Insightful and hope the collaboration that you talked about that we can see more and more happening across the ecosystem as well. Thank you so much, Mohammed. And to all the listeners, please do subscribe to our podcast. You can also view us on YouTube and also subscribe to our LinkedIn page. Thank you so much.