Today, I'm joined by Matt Gershoff, Co-founder and CEO of Conductrics, a software company specializing in A/B testing, multi-armed bandit techniques, and customer research and survey software. With a strong background in resource economics and artificial intelligence, Matt brings a unique perspective to the conversation, emphasizing simplicity and intentionality in decision-making and data collection.

In this episode, Matt dives into Conductrics' background, the role of A/B testing and experimentation in privacy, data collection at a specific and granular level, and the details of Conductrics' processes. He emphasizes the importance of intentionally collecting data with a clear purpose to avoid unnecessary data accumulation and touches on the value of experimentation in conjunction with data minimization strategies. Matt also discusses his upcoming talk at the PEPR Conference and shares his hopes for what privacy engineers will learn from the event.

Topics Covered:

Matt’s background and how he started A/B testing and experimentation at Conductrics
The major challenges that arise when companies run experiments and how Conductrics works to solve them
Breaking down A/B testing
How being intentional about A/B testing and experimentation supports high level privacy
The process of the data collection, testing, and experimentation
Collecting the data while minimizing privacy risks
The value of attending the USENIX Conference on Privacy Engineering Practice & Respect (PEPR24) and what to expect from Matt’s talk

Guest Info:

Connect with Matt on LinkedIn
Learn more about Conductrics
Read about George Box's quote, "All models are wrong"
Learn about the PEPR Conference

Send us a Text Message.

Privado.ai
Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.

Shifting Privacy Left Media
Where privacy engineers gather, share, & learn

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.

Listen on

Apple Podcasts Spotify Amazon Music Podcast Index Overcast YouTube +

Share Episode

Share on Facebook Share on Twitter Share on LinkedIn

Matt’s background and how he started A/B testing and experimentation at Conductrics
The major challenges that arise when companies run experiments and how Conductrics works to solve them
Breaking down A/B testing
How being intentional about A/B testing and experimentation supports high level privacy
The process of the data collection, testing, and experimentation
Collecting the data while minimizing privacy risks
The value of attending the USENIX Conference on Privacy Engineering Practice & Respect (PEPR24) and what to expect from Matt’s talk

Guest Info:

Connect with Matt on LinkedIn
Learn more about Conductrics
Read about George Box's quote, "All models are wrong"
Learn about the PEPR Conference

Send us a Text Message.

Matt Gershoff: 0:48

I think the biggest issue and the hardest part of decision making is being intentional; it's being explicit; it's being very clear about what the problem is and then having a principled way of learning about the problem and then taking action. I think most of our value, alongside with the technology, is in that guidance that we give our clients and that we help them be forthright and go into this with their eyes wide open, as opposed to promising them a magic bullet.

Debra J Farber: 1:20

Hello, I am Debra J Farber. Welcome to The Shifting Privacy Left Podcast, where we talk about embedding privacy by design and default into the engineering function to prevent privacy harms to humans and to prevent dystopia. Each week, we'll bring you unique discussions with global privacy technologists and innovators working at the bleeding- edge of privacy research and emerging technologies, standards, business models and ecosystems. Welcome everyone to Shifting Privacy Left. I'm your host and resident privacy guru, Debra J Farber.

Debra J Farber: 1:56

Today, I'm delighted to welcome my next guest, Matthew Gershoff. He's Co-founder and CEO at Conductrics, a software company that offers integrated A-B testing, multi-armed bandit, and customer research and survey software. While having been exposed to various advanced approaches during his master's degrees in both resource economics and artificial intelligence, Matt's bias is to try to squeeze as much value out of the simplest approach possible and to always be intentional about the marginal cost of complexity. Conductrics has been at the center of many of the most recognized brands optimization strategies by delivering decision-making at scale, reducing the technical debt of legacy technology and bringing data-informed intelligence and refinement across the enterprise. Marketers, product managers and e-commerce professionals responsible for customer experiences can then continuously optimize the customer journey, while IT departments benefit from the platform's simplicity and ease of integration with the existing technology stack.

Debra J Farber: 3:04

So, I first met Matt at the PEPR conference last year and I thought he had a fascinating niche of expertise. I knew that I had much to learn from him. Today we're going to talk about privacy in the context of A/B testing, experimentation, optimization, machine learning, personalization and multivariate testing. Welcome, Matt.

Matt Gershoff: 3:29

Great to be here, Debra. Thanks for having me. Really excited.

Debra J Farber: 3:30

Yeah, absolutely, and I'm getting excited because PEPR is actually in a few days. It's Friday, May 31st as we're recording this and it's coming up. It's on June 3rd and 4th. I will see you there. But, before we begin our fun discussion today, I did want to address the audience on a personal note. Thank you all for your patience.

Debra J Farber: 3:49

I haven't published a podcast episode in about a month because I've been a little busy planning my wedding. After nine and a half years together, an attempt at getting married in March of 2020 in Mexico, only to be canceled five days before the event due to a brand new COVID pandemic, and a slew of other roadblocks, I finally got to marry my best friend, Mack Staples, on May 24th. That was just seven days ago. It was a magical, joyful, overcast day in the PNW, amongst family and friends and a couple of alpacas, because why not? It didn't rain, despite a forecast calling for 50% chance of showers, and I'm still glowing from the pure, unbridled joy of the day. So, Matt, thank you so much for being my first guest after a short break.

Matt Gershoff: 4:36

Thanks for having me. Congratulations and mazel tov.

Debra J Farber: 4:39

Thank you so much. Thank you so much. It was definitely a blend of my Jewish background and my husband's Celtic background. So, we even had cups that says "mazel tov on one side and the other says sláinte. So, but, thank you, I appreciate it. All right. So, Matt, you have such an interesting background to me, probably because it's so foreign from the work that I do on a daily basis. Why don't you tell us how you came to focus on A/B testing and experimentation and all the good work you're doing at Conductrics?

Matt Gershoff: 5:10

Well, so academically, back when I first went to school was, as you said, resource economics; and there's a heavy focus on econometrics. And what econometrics is is really about trying to answer causal inference questions when you can't really run experiments. And so, the main objective, or the main hope, is that one can find a natural experiment so that you can answer "If something were to have happened then what would be the outcome. As opposed to just finding correlations, one is looking for causations. Then, I worked in advertising and database marketing for about five years at a global agency, both in New York and in Paris. tThere we were using a lot of methods to try to figure out "f we had some sort of promotional campaign or if we had some sort of ad, would that affect buying behavior? And so that included TV campaigns as well as direct mail. This was back in the day. This is actually right at the cusp of the internet. I'm a bit older than probably many of your listeners, and so this is around 95 to 2000, right when the internet was starting. But even back then a lot of the modes of analytic marketing kind of come from that database marketing world, and so we were able to run experiments. And so with an experiment, often called a randomized controlled trial," and in sort of the parlance of the day now is often called A-B tests. It's really the same thing. We would run experiments to try to see "if someone got a particular offer in a mailing, would they be more likely to purchase the product versus someone who got a mailing without the offer, versus someone who didn't get the mailing at all? And so the main idea is that we would randomly assign people into different experiences and then we would see what the average effect was. And so, you know, I had a lot of applied experiences as well as academic experience in that. And then, I was working in a software company during dot com 1 in Manhattan, and then we were there during September 11th. After that, there was sort of a "Hey, you know, I've always been kind of interested in artificial intelligence, and so I had a early midlife crisis, I think, probably because of just what happened after in New York. You're like, hey, you know, life is short.

Matt Gershoff: 7:38

I went back to graduate school for a Master's at the University of Edinburgh and I studied artificial intelligence and there it was really fascinating where I learned about reinforcement learning. Remember this was back in, finally went back in like 2005. This was 2005. So, this was really before what I call "the inflationary period of in artificial intelligence. It was really sort of a small community. In fact, most of the folks I would speak to when they would ask me what I was going to be doing was studying, and I said AI. "Well, that's interesting, but what are you going to do with it? Because at the time it was, like you know, not really a thing at all.

Matt Gershoff: 8:13

So, there I had learned about reinforcement learning and I was really fascinated by that because what it was is a lot of the theory that we had learned in economics, known as optimal control, which is really when you have a set of sequential decisions, how do you make the optimal decision across a sequence. And, that's kind of a similar problem to reinforcement learning.

Matt Gershoff: 8:40

But, what was nice is that reinforcement learning gave a format to actually. . .it was like a nice framework for applying some of these marketing ideas. So, that was what gave me the idea for Conductrics back in the day, which was, I had thought (and this was naive and actually is the reason, I think that we really should try to use the simplest thing possible) is because I was, at the time, taking an alternative approach. I wanted to use these methods from reinforcement learning and I was fairly have this for people to use to solve these marketing problems and it turned. It it was just too complicated and it was just too much cognitive load for users to use this type of thing. It was sort of too much to ask. So, we really started to focus more on the A/B testing capabilities of the software and the more simple multi-armed bandit, which is similar to the more general reinforcement learning problem; it's really very similar to an A/B test, except where you want to adaptively change the weights of the different experiences so that you can try to figure out what works in the shortest amount of time.

Debra J Farber: 9:58

That's fascinating. What are some of the major challenges that you're seeing when it comes to companies running experiments? What were the main challenges that you were trying to solve for? And then maybe tell us a little bit about Conductrics and how you're solving for them.

Matt Gershoff: 10:07

Yeah. So I would say you know to step back. I think, while we're software as a service and you know we are a technology company, I think one of the biggest impediments is magical thinking and a belief that just using technology is going to solve whatever problems that you have. And I think really what it is, is it's really A-B testing or experimentation programs. The main value and I think this is a little bit different than maybe how others might look at it, but I really do believe that the main value is in that it provides a principal framework for organizations to act and learn intentionally. And so by intentional I mean like to take actions with awareness, to do it deliberately, to do it with conscious purpose. And I think one of the risks is that companies sometimes, you know, sort of engage in a bit of magical thinking and just thinking well, if we do this thing, it almost becomes like ritualized behavior. And you know, if we do this thing that these other companies do that seem to be successful, like the Metas or the you know the companies in the Valley if we do similar actions that they have taken, then we will also get those same types of results, which, of course, is the antithesis of science, because it's really about the application of the scientific method to decision-making and good science, because it's really about the application of the scientific method to decision making and good science, even if you go as far back as George Box, of all models are wrong, but some are useful.

Matt Gershoff: 11:41

Fame, I believe. It says 76 science and statistics. It's actually a great article. I recommend anyone to read that. Especially the first few pages are very accessible. It's really this idea that you know being a good scientist is to do the simplest thing really possible that solves the problem. A mediocre scientist is one who does things that are, for complexity's sake, so over parameterize and make things more complex. So I think the biggest issue and the hardest part of decision making is being intentional, is being explicit, is being very clear about what the problem is and then having a principled way of learning about the problem and then taking action. And so I think most of our value, alongside with the technology, is in that guidance that we give our clients and that we help them be forthright and go into this with their eyes wide open, as opposed to promising them a magic bullet.

Matt Gershoff: 12:43

I think magical thinking is really sort of the main difficulty at a high level. And then of course there's like how to actually implement at a high level. And then of course there's like how to actually implement. You know there's the technical side of things which is, you know, kind of kind of detailed, and you know, I'm not sure how interesting that'll be for your listeners.

Matt Gershoff: 13:00

But I think the main thing is that we provide a host of different ways of of using our software in conjunction with companies, different tech stacks. So you know, there can be client side issues or it could be server side. There's a bunch of ways of doing implementation. But I do think the main thing is that inference statistical inference is hard, and by inference I mean, you know, trying to learn based upon observation, which is really what analytics and experimentation is about is a difficult problem and it's important for organizations to be cognizant of that and they have to do the hard work. And then we're just there to help them implement their ideas. Problem, and it's important for organizations to be cognizant of that and they have to do the hard work, and then we're just there to help them implement their ideas.

Debra J Farber: 13:37

That's fascinating. Let's define first like what exactly is A-B testing? I know we're talking at a high level and that was like really helpful as an intro to the conversation, but if we could dive a little deeper in before we even get to how you know, how is privacy relevant? Let's first define like what is.

Matt Gershoff: 13:54

A-B testing. Yeah, so there's lots of complicated ways we could talk about, but I just think, at the bare bones, the most simple thing is there's really just two ideas to keep in mind when thinking about A-B testing. And A-B testing is just a form of experimentation and you can think of it as a clinical trial, as an A-B test, right. And so, in your mind's eye, if you're thinking about, say, a vaccine and you want to learn the efficacy the marginal efficacy of a particular vaccine versus, say, placebo, there's two main bits to the A-B test Right. One is that it needs to have a process for assignment, so we're going to assign different users or different people different treatments in such a way that it blocks what's known as confounding, and all that really means is that the user, so the person entering into the experiment, should have no say whatsoever of what treatment they get, and that could be explicitly like not being able to ask for the vaccine versus the placebo. In the case of digital systems, it might be that you need to make sure that, let's say, someone's running an old version of the browser and it is failing to execute one of the treatments properly, so the telemetry does not come back, and so there's this missing data that sort of implicitly, is similar to, like, those types of users opting out of one of the treatments, and so you get, potentially get skewed results. It's really a system that is very robust in ensuring that the end user, the people who have entered the experiment, do not have any say in what treatment they get, and so essentially that's this idea of randomization. So we're going to randomize, and so the allocation mechanism is going to be random, and that randomization procedure blocks the self-selection bias. So what we don't want is, say, older folks to be taking the vaccine, if you know most like if the people who are taking the vaccine tend to be older folks and the people who are not taking the vaccine tend to be younger folks, that might skew the results, because you're going to confound the results of the vaccine what's known as the treatment effect with the fact that the older results might have overall worse conditions, regardless if they take the vaccine or not, and so that's what you're really trying to prevent.

Matt Gershoff: 16:20

So that's the first part, and if you're able to block confounding, so there's no confounding then we can use statistical methods to try to make inferences Right, and so we need to have a way, since we're doing randomization, so our groups are randomized.

Matt Gershoff: 16:40

We need to not just see which one performs just better, naively, so you could just look and say, ok, the vaccine has lowered the incident rate by some amount, but you also need to take into account the random variation, so what we call sort of the standard error or the randomization, sort of the noise of the experiment. And so there's these two things One is to block confounding and two to have some principled way of removing sort of the background noise so that we can unearth the signal and see whether there's a signal there. And so that's really it. There's really these two things. Now there's a lot of work around what the appropriate statistical method should be, and folks tend to focus a lot on that. But really at its core, that's really all you need is just some process to block confounding and then some process to evaluate the treatment effect.

Debra J Farber: 17:30

Awesome. Okay, so that's real helpful foundational info for me and I'm hoping for a lot of my listeners as well. And you mentioned a little before the value that companies can realize from experimentation, but I was wondering if you had some examples at a higher level, like if you have experimentation like how does the company overall benefit, rather than you know in an individual use, each individual use case? Like obviously decisions are being made and information is being collected about how decisions are being made and thus the company could take action. But is there like a higher level, like value realized?

Matt Gershoff: 18:05

Well, yeah, I think the higher value is what I think I mentioned before, which is that an organization has a principled way for making decisions, such that they are explicit about what the questions are. So it's like we want to see whether or not these search engines, which search engine algorithm, has greater efficacy, and what is that difference that we care about, like how much is a meaningful improvement? And just by speccing that out, just by having a process that that question needs to go through, forces the organization to have to be sort of explicit at the start. And it's like well, you know, if they feel that this other search engine algorithm isn't going to perform, if no one believes this can perform above a certain threshold and that certain threshold is not really going to be that valuable to the organization, then it's a way of almost filtering out wasteful activities. So, on the one hand, at a high level, it's a great way for forcing the organization to be explicit and explicit about trade-offs and on the other side, I think it's a way of managing risk and so like, if I'm a product manager and, for example, we have a media company and they have going back to the search engine algorithm, they want to improve the search on their site. It's a media company. They use us to run an experiment versus the existing search engine, versus alternative search engine algorithms, and then see, based upon certain performance metrics, whether or not the alternative, the new search engines, seem to be performing better, certainly not worse, and if so, if they are above a certain threshold, then they can more safely, with lower risk, switch over to the new search engine algorithm and then provide better experience for their customers. So the whole idea is to have a procedure that makes folks intentional and minimizes the risk of delivering pushing poor experiences to their customers.

Debra J Farber: 20:49

Oh, that makes a lot of sense. So now I'd love to turn to you know how does being intentional about A-B testing and experimentation affect privacy?

Matt Gershoff: 21:01

Yeah, and so I think we did a major rebuild of the software in, and I think it was 2015-ish. We got you know, we started reading up on GDPR. This is before. I think GDPR was 2018, 2019. I'm not sure, but we had some awareness, especially of Article 25. And then we had read through privacy by design principles specifically Principle 2, which into conductors, especially that idea of default, and so we don't want to be paternalistic and tell our customers on how they need to use experimentation, but we did want to have this default option and we wanted our software to have it, so that if you used it, its default condition would be to follow.

Matt Gershoff: 21:51

Principle two, which is essentially, you know, is to minimize identifiable information that you're collecting, and this is about embedding this in the technology as part of a design and engineering principle, that it should be the default behavior, right, and so that means that the customer should gesture or be explicit when they have use cases or situations where they would need to collect more information, which again, is fine because it's ultimately their call, it isn't our call. And then also to keep the linking at a minimum, so to minimize the linking of personally identifiable information, and that is interesting because it's sort of in direct opposition to we. Usually what I think of is just in case mindset about data collection. So when you, if you're in, like the data science world or the analytics world, there's this view that you always want to collect more information. It's a maximalist approach which is interesting, which is like you want to maximize, you want to have this 360 view of the individual, so you want to maximize identifiable information. You want to maximize linking that information across the customer so that you can, like you can, associate events that they've done or metadata about them all across and so, and that's the default behavior. So, in a way, there's this idea of what we think of as sort of this just in case, which is to collect everything, collect all the data at the finest level of granularity and to collect as much data as possible, as opposed to that, the sort of the just enough, which is the privacy by design. It turns out that what's nice is that if you do follow the privacy by design or the principle to data minimization, it does relate as well to this intentionality, because you have to think through well, what is sort of the value of collecting it at this finer level of granularity?

Matt Gershoff: 23:48

Really, stop and think about it, a lot of the just-in-case approach of thinking about data collection, and even in the Privacy by Design, the original article, they talk about how richer data tends to be more valuable. So, if you have richer data, that's preferable to data that has less information or less granularity into it, and I think, in a way, that's the wrong. While that is true, because it gives you greater degrees of freedom, it gives you optionality, and so really, what tends to happen is that this collect everything, because why not is really about an implicit objective, which is about maximizing optionality. Right, so I want to have the option, and that's driven by a couple of things. That's driven byality right, so I want to have the option, and that's driven by a couple of things. That's driven by fear. Right, you know, I want to minimize my regret. What if we didn't collect this bit of information? And that was what the boss is going to ask me about? I don't know if the boss is going to ask me about it, but just in case, right. So there's like a kind of a cover your behind type of mentality there, and then there's the magical thinking which is like it's that next bit of information that we don't have is where there's going to be some sort of huge payoff. We live in some sort of fat tales world. There's big payoffs that are out there in the shadows, and if we just add more information, we were able to link some more additional information or we had more granularity about the individual, then we're going to have some huge payoff.

Matt Gershoff: 25:20

And that's, I think, is magical thinking. Unless you're explicit, like your goal is to maximize optionality, you really probably shouldn't be doing it. And what I like about, especially because there's a cost. Now there's this shadow price of privacy. You are in opposition to privacy by design, because the default behavior should be data minimization, and that is in these privacy guidelines, which you know much better than I do, but certainly in, I think, article 25 from GDPR as well as 5A. There is a cost to doing it, and so we should be cognizant of why we're collecting data, and so I think what is nice is that they dovetail.

Matt Gershoff: 26:00

Data minimization dovetails nicely with this intentional way of thinking about it, because you need to state up front what information we want to collect in it and at what level of granularity, and it works well with experimentation in particular. And I first want to say I don't mean to be paternalistic, in that there are many cases where it might make sense to try to collect, you know. If you know, yeah, some sort of sense why you might need it and you can make a good rational argument for it, that's totally fine. But what's interesting is, at least for experimentation, we have what I call just-in-time problems. It's normally in the analytics space, or a lot of times in the analytics spaces we're collecting data for future questions. I'm not exactly sure what the question is going to be, but I want to have this information so that we can do some exploratory analysis, or there might be some sort of additional question that we might have in the future. I'm not sure.

Matt Gershoff: 26:55

Again, that's related to this idea of optionality, whereas experimentation is the opposite. We have the question first, right, we have a hypothesis Is this new search engine better than this other one? Or is this new product better than what we currently have? Or is this new marketing campaign better than not doing anything at all? Or the vaccine example, right? So we have the question and then we need to collect the data for this question. So that's what I mean. It's sort of just in time. We were collecting the data for this question.

Matt Gershoff: 27:27

So we have an explicit task, and so that means that we can have data at the task level as opposed to at the individual level. And so that's the way our software is built is, on this idea of tasks, and so for each experiment is its own task, and so then the data is collected about the task as opposed to the individual, and so we don't link the individual across as much as possible, it's only we just keep aggregate data. You can think of it as equivalence classes, and if equivalence classes doesn't resonate just sort of like a pivot table, so the data is stored in aggregate and so we just keep combinations of the treatments of someone got an A or someone got a B. That's just implemented in Conductrix. So we just have a small little table which just has a row which says a column, which says a column which says treatment, and there's an A for one row and there's a B for another row, and we just increment the counts. And then we just increment the conversion counts or the sums of the conversion counts, and just as a technical detail, if it's a numeric conversion event like sales, we just aggregate or increment the squares of the value and it just turns out with those three bits of data counts, sums and the sums of squares and so that's just a little technical bit.

Matt Gershoff: 28:48

You can do A-B testing, and so you don't need to collect the data at the individual level. You only need to store it at this aggregate level, and that seems to us to be consistent with this idea of privacy by default. It should be what folks are doing if they're following privacy by design, unless they have some other reason for not doing it. But for the task of running an experiment, running an A-B test, that's really all you need, and hence that's probably all you should be collecting. If you need it for some other reason, fine, but for an experiment, you probably don't need to be collecting anything else.

Debra J Farber: 29:26

That is a tremendous amount of information that has been enlightening for me personally. Just hearing you talk about that it also makes me think, when you know, thinking from an operational perspective that it would be if you have like a limited set of data you're collecting because you're being intentional, because you are, you already know the questions you're asking and all that that as you are addressing privacy risk in an organization that you could better audit what has been collected or what, rather than how do you audit? For it could be anything and we're collecting everything you know.

Matt Gershoff: 30:02

That's right, and so each table is scoped, and so in a way it makes it quite easy to look at. You know each of the data structures that are there and what's being stored. So from an auditing side, but also it has advantages in that on the computational side, because you know you've already done as you've collected the data, the data is already in its mostly summarized form, and so there's less computation that is required to do the analysis. And now for a simple A-B test like I'm talking about, that's not really such a big deal, but really the same idea can be extended to what's the case of regression. So really, underneath the hood, behind a lot of these statistical approaches, is really doing a regression problem, and so you can, using the same type of aggregation approach, calculate what's known as ordinary least squares regression, and so that's a technical bit. What's nice is that the fact that you can do this in a very efficient way. It means that we can now do what's known as multivariate testing, which is for the folks who are more statistical, it's more like just a factorial nova that can be done. And we can do other things like evaluate whether or not two A-B tests are interacting with one another, like maybe one A-B test might be interfering with the results of another. The fact that we can do regression in this way means we can answer that we can have tools that can alert our customers whether or not one A-B test might be interfering with another A-B test.

Matt Gershoff: 31:38

We can do things like if you are passing us side information, so the data structure need not just be the A and the B, but maybe you're passing us categorical information. That's one of our designs is that you could pass us additional information about the user, but those additional bits of information are limited in that basically, they're already categorized. You have to send us categorizational information, like segment information. So rather than being able to send us numeric information of arbitrary precision which is at a very fine level of granularity you don't want to do that in a sort a data minimization approach you want to think about what the optimal level of granularity is so that you don't wind up with implicitly unique identifiers, right, and that's we don't want to do that accidentally. Have sort of like a, have a, you know. You want to manage sort of the you know, sort of the entropy of of the data that we're collecting, and so we allow our customers to pass along additional information, so you might have something like loyalty status, and that might have five different values to five different levels, or maybe there's a tenure, maybe that's 10 levels, and so that's capped at a certain level of cardinality.

Matt Gershoff: 32:50

This idea of cardinality is important, which is how many unique elements are there in each potential data field, and so, by default, we limit it to 10 unique elements, and, again, that helps constrain how much data that you're collecting.

Matt Gershoff: 33:07

But it turns out, though, that you can still do regression on this data that's aggregated by treatment, by, say, tenure, by loyalty status, let's say, and so we can constrain and know how large the data structure is, just because we already know how data is going into it, unlike when you just collect data at the individual level.

Matt Gershoff: 33:34

The size of that data structure will be the number of users who enter the experiment, so maybe it's a million or two million rows, whereas here it'll be bound by the combination of the number of treatments, by the cardinality, the joint cardinality of each of the data fields.

Matt Gershoff: 33:51

So if you had a loyalty status of three and login status of login logged out, that's six, and you had two treatments, that's just a total of 12 rows that you would need to store the data. So it's nice, and what's also nice is that we can also take a look at the count for each row, and then we can report back on what's known as sort of the K of K anonymous data, and so we implicitly store the data at a K anonymous level, and while you know, we're not claiming that K anonymity is like the end all and be all for data privacy, and there was a lot of discussion about that. It is nice, though, that we can use this notion of K-anonymous data as almost like a reporting for our clients, so that they can inspect back to your notion of auditing and kind of inspect and see what's the finest level of granularity that we actually have across all of the experiments and that information can easily be surfaced so it can be managed.

Debra J Farber: 34:56

Yeah, that is really helpful. I'm struck by the fact that, including the, that making the constraint through experimentation and collecting only certain data, not all the data you know, that actually helps make the process for all along, like regression, testing, all of the all of that more efficient, it sounds like. But also with k-anonymity, while there is discussion about how maybe it's not the gold star of anonymity anymore in terms of creating some level of assurance of anonymity, it does sound like from what you're saying that just to define k-anonymity is a property of a data set that indicates the re-identifiability of its records, that at least being able to have k-anonymous data, like you said, it helps to indicate to the team what level of re-identifiability there might still be that they need to account for. You know, if they're taking privacy into account, which I hope they are, yeah.

Matt Gershoff: 35:50

Or if it's a particularly sensitive case. Yeah, and it's also like a lot of the times this data is not particularly sensitive anyway, and so that's usually why you'll you might get some pushbacks, like I don't really get, especially in the U S companies. A lot of the companies in Europe or companies in the U? S, like the financial institutions, like at the banks that we have healthcare. This is like of great relief because then it's easier for their compliance and it's easier to like know exactly what's happening, and so it can be extremely useful in those contexts. That's right.

Debra J Farber: 36:21

Well, thank you for that analysis. I think that's that was really helpful, and I kind of want to turn our conversation to the Pepper 24 conference, which is where we met last year.

Matt Gershoff: 36:32

I'm so excited for it. First of all, it's too bad that this isn't coming out beforehand, because I just want to totally promote Pepper. I had such a great time. First of all, I got to meet you last year, but also it's just such so impressive the speakers and just the attendees. And not only were they just such high intellectual capacity and creativity, but it's just such a welcoming vibe. And I'm not really from the privacy space. Again, we do a lot of these. We've added a lot of this capabilities because we thought it would be good design and by doing so it opens up and makes a lot of the system better and it is consistent with this just idea of just being explicit about why we're doing things. But no, I really fell in love with the whole community from last year and I'm really excited to be going again.

Debra J Farber: 37:16

Yeah, I think you underscored why, even like nine days after my wedding, I'm like I got to go to Pepper. Sorry, love, I'll see you. People are like why are you here right now and not on a honeymoon? It's like, well, I couldn't miss Pepper because I had the same experience. It was more of like all this, a very welcoming.

Debra J Farber: 37:35

I mean, most people don't come into privacy engineering because they were into privacy first. It was they got, usually got pulled into it from somewhere else and yours just happened to be through your expertise of you know of experimentation, optimization and all of that fun AB testing stuff and like a lot of people will just kind of come in and have that one viewpoint or several viewpoints of how they are attacking a problem, but they don't necessarily are following the entire space, right, that is very unique and as someone who follows the entire space, I could tell you there's a handful of people that I've come across that are really that knee deep. So it was a great opportunity to bring people that are working on privacy and technical capacity, maybe in one little area or it is still very siloed, especially in the research space, right, you know your homomorphic encryption. Folks are not exactly talking to your differential privacy data scientists. It's a different world. So what is so lovely about Pepper is that thought-provoking talks. They're short but they have to be impactful.

Debra J Farber: 38:33

This year I was really excited. I had the opportunity to sit on the program committee and help with talk selection and then, relatively last minute, I got asked to sub in to be a moderator of a panel. So I'll be moderating privacy, design patterns for AI systems, threats and protections, and I'm also going to be serving as a room captain for the last three talks that are focused on threats and engineering challenges. But the reason that I keep coming back is it is a small conference. I'd say there maybe there were like 200 people last year. I would hope there are more this year, because just everybody who went was talking to their friends and colleagues about how awesome it was.

Debra J Farber: 39:10

But it enabled the and facilitated these hallway conversations because you didn't need to be in the hallway, because they were. It's small enough to really be, to have those conversations, kind of wherever you are. And, yes, people are super welcoming. There aren't really any vendors. So it really just feels like focusing on the practice of privacy engineering not the theoretical that exists, but at other conferences, and so yeah, I agree. I mean I've been talking it up on my show all year, but I agree it would be nice if we were able to record this in time to get people excited about Pepper. But I know you're going to be giving a talk under the privacy preserving analytics section of talks called being Intentional, a-b Testing and Data Minimization, and I know we've talked about a lot of that today. So I want to you know, at least pose the question to you of what are you hoping that privacy engineers take away from your talk?

Matt Gershoff: 40:06

Oh well, you know, really I do feel a little bit like an outsider to the community and so I'm really just trying to give an application like here is actually a company out in the world really just trying to provide good product for our customers and help them solve their problems, and here's an example of where we proactively reviewed privacy by design principles and here's an example of a company that you know try to actually implement some of them into our software and the fact that there are these happy I don't know it's coincidences, but there's maybe some sort of deeper benefits. So there's these additional benefits that you get from privacy, where it's not just a constraint and seeing it as a blocker, but that it also can offer some major advantages when it's appropriate. Now again, everything is a trade-off and so there may be a context where it's not, but it does have these very nice extra benefits of being more computationally efficient for certain types of cases as well as, as you say, being more parsable or interpretable so that you can actually see actually what data is, and it's easier to manage that type of thing. So really we're just going in there just to be thought-provoking, coming at it not exactly from like privacy, but it's like hey, turns out that this privacy thinking sort of privacy by design helps with thinking intentionally, which we believe is really the main objective of experimentation and so really just giving a different way of looking at it and maybe being thought for, provoking for some folks to be thinking about.

Matt Gershoff: 41:42

Oh yeah, I had been thinking in terms of this just in case approach to collecting data and my organization, and maybe we can apply some of these approaches within our orgs, even if we're just acting as the secure curator and then we're going to release data in this way. Maybe I hadn't realized that we can do all of the stats or most of the stats, I should say for many of the cases for A-B testing. We could do it in this sort of K-anonymous form.

Debra J Farber: 42:08

That's awesome and I definitely think that people will come away with that. I'm really looking forward to your talk. Are there any other talks at Pepper that you are excited about or topics that you're hoping to learn more about you?

Matt Gershoff: 42:20

mean other than your panel. I mean so definitely your panel.

Debra J Farber: 42:24

I'm going to be helping to feature the other great speakers as the moderator. But, yes, other than that panel.

Matt Gershoff: 42:30

Yeah, I mean mostly I'm just excited to be around folks.

Matt Gershoff: 42:34

As many of your listeners know, sometimes in the privacy space it's about what you can't do, and there's like that procedural approach that we kind of chatted about before we got on, which is just people looking at the check boxes and trying to, like, prevent folks from doing things. What I love about Pepper is that it's really people trying to be cognizant and respectful and trying to solve the problem and trying to get good outcomes, and so that's the main thing. There was some, you know, a couple of interesting chats other than your sessions, which is the learning and unlearning your data in federated settings by I think it's Tamara Bonacci. I apologize to them for if I mispronounce their name, but that looks really interesting about with some machine learning and trying to figure out how to unpack data that it may have learned on. As I interpret it, that maybe you don't want them to, because we're interested in that. Since we do some of that work with machine learning, it's important to know how one might be able to unfold that data.

Debra J Farber: 43:33

Awesome. Now, before we close, do you have any words of wisdom for the audience today?

Matt Gershoff: 43:41

Just the main thing is just to be aware and it's really important to be again. Our whole thing is to be thinking about, like our world is focused on hospitality. So to have empathy, to have empathy for the folks that you're serving, and do it respectfully, and to do it rationally and part of being rational is to be mindful and to be cognizant of the trade-offs as well as unintended consequences, and so I think experimentation, in conjunction with data minimization principles, helps you do that.

Debra J Farber: 44:17

Well, thank you very much, Matt, for joining us today on the Shifting Privacy Left podcast.

Matt Gershoff: 44:23

It's my pleasure and thank you so much for having me. It's quite an honor.

Debra J Farber: 44:26

We'll have you back another time Until next Tuesday, everyone, when we'll be back with engaging content and another great guest. Thanks for joining us this week on Shifting Privacy Left. Make sure to visit our website, shiftingprivacyleftcom, where you can subscribe to updates so you'll never miss a show While you're at it. If you found this episode valuable, go ahead and share it with a friend. And if you're an engineer who cares passionately about privacy, check out Privato, the developer-friendly privacy platform and sponsor of this show. To learn more, go to privatoai. Be sure to tune in next Tuesday for a new episode. Bye for now.

The Shifting Privacy Left Podcast

S3E12: 'How Intentional Experimentation in A/B Testing Supports Privacy' with Matt Gershoff (Conductrics)

Podcasts we love

Listen to this podcast on