Real World Serverless with theburningmonk
Real World Serverless with theburningmonk
#24: Serverless at Stedi with Zack Kanter
You can find Zack on Twitter as @zackkanter.
Here is the ARC410 session from re:invent 2019 that Zack mentioned in the show.
For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.
Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0
Yan Cui: 00:12
Hi, welcome back to another episode of Real World Serverless, a podcast where I speak with real world practitioners and get their stories from the trenches. Today. I'm joined by Zach from Stedi. Hi, welcome to the show.
Zack Kanter: 00:24
Hi, thanks so much for having me.
Yan Cui: 00:25
So can you tell us a bit about yourself and your experience with AWS and serverless?
Zack Kanter: 00:31
Sure. So I'm going to start with telling you a little bit about Stedi and what exactly it is that we're building here.
Yan Cui: 00:36
Sure. Sounds good.
Zack Kanter: 00:38
So easiest way to think about what we're building is we're sort of building this missing, untouched piece of infrastructure, which is a network for global businesses. So if you think about how you might send transactions back and forth between two businesses, it's not as straightforward as you might think. Of course, there's the catch all sending PDFs over email. So if you've sent a consulting invoice to someone, you've probably, you know, exported that from QuickBooks or Xero, and sent out via email. But if you're needing to do this at a high volume, and send, you know, maybe hundreds or thousands of invoices, a day or a week or a month, it starts to get pretty, pretty tedious from there. So what Stedi is, is what we call it a messaging platform for B2B trade. So what we allow people to do is create an organisation on our network, add users to that organisation, and then exchange transactions back and forth with other organisations and 300 different message types that describe all the different ways that companies can do business with each other. Might be things like invoices or ship notices, purchase orders. In the healthcare space, it could be, you know, some sort of a healthcare claim or, or eligibility check. And it really just describes all the possible ways that companies can do business with each other. Of course, once in order to get these transactions onto our network, you know, there's a number of different ways to do it. You can use our API, you can use our UI, or we are interoperable with this legacy format called X12 EDI, which is this sort of file format and transfer protocol that predates XML, predates JSON. Looks like it came off a dot matrix printer in 1985. So that's a little bit about what we're doing. And as you've probably guessed, by the fact that we're talking on this podcast together, we're using a 100% serverless approach. And that's something that we've done since day one.
Yan Cui: 02:44
So am I understanding it right that is a kind of predefined forms that are industry standard? Are these also specific to a particular country as well?
Zack Kanter: 02:54
You know the... there's really two competing formats in the world. One of them is called X12 EDI, which has many many, you know, 300 plus different transaction types. The other one is called EDIFACT. And EDIFACT is a more limited set, but it was developed by the United Nations and designed to help facilitate international trade. So those are the two that are really used. Part of the challenge of it might sound like, okay, there's this global standard, so shouldn't it be relatively easy to exchange transactions back and forth. And the problem is that the standard is really a super set of all possible values. So if you look at something like an X12 ship notice, which is a transaction number 856, there might be five or six different places that you could put, let's say, a tracking number in that transaction. And so when two companies get together, let's say it's Walmart and Procter & Gamble get together and they want to do business, they have to negotiate, so to speak like a software contract that dictates where Procter & Gamble and Walmart are willing to put this data in the transaction. Now, Walmart, for example, might prefer it at this at the header level of the transaction. And maybe Procter & Gamble wants to send it at the line level so that you can, you know, add detail in terms of what item is in what package. And these sorts of nuances is what causes it to take many, many weeks or months to get one of these integrations up and running today. So what we've done is we've taken this super set of transactions, we've, I guess, you could say, we've forked it, and then we've modernised it and brought our opinions to it. That make it a lot less confusing. When two companies want to get together and do trade. There's one place to put each sort of piece of metadata around a transaction.
Yan Cui: 04:54
Okay, I see. I think I got the gist of it that you guys are providing that consistent format so that all the different businesses can easily transact with each other.
Zack Kanter: 05:02
Exactly right, you might think about it similar to Stripe. So with Stripe, you don't need to work with a different Stripe API for MasterCard, or Visa or Discover or American Express, you have a single API that makes it easy. We've done that, except we've done it for 300 different transaction types, and, and made it a network such that any business can join, and then transact with other businesses and in our standard format.
Yan Cui: 05:27
Gotcha. Yeah. And the Stripe at a time was a pretty revolutionary in terms of how it manages all of that complexity behind the scenes for you. So in this case, how does your architecture look like from a very high level? How are you facilitating all of this transaction between different companies?
Zack Kanter: 05:41
We started off, you know, we picked serverles from the early days. And we're about three and a half years old. We're a venture-backed software startup. So we've raised about $21 million dollars in venture funding. And the commitment to this serverless or, you know, serverless full approach came from the early days. And I can talk to you a little bit about the architecture, but maybe it helped to give you a little bit of the backstory first of how we got here. You know, I came from the world of physical products before this. So I founded up a brand of auto parts and manufactured a couple thousand different products in Taiwan and China, and used as many outsourced services as I possibly could. So I could just focus on developing these great products. And so I used an outsourced manufacturing in Taiwan and China. I used an outsourced fulfilment. It's called [inaudible] outsourced warehouse to stock and ship the product, and so on and so forth. And it was an enormously, not just scalable way to do things, but profitable way to do things. Because, you know, it meant that every single transaction that we processed as a company every time we got an order, but it meant that we could fulfil that order with a very predictable cost. So we knew that it would cost us maybe, you know, 4 dollar and 50 cents to fulfil an order, and that no matter how large we scaled, or how few orders we had in a day, we're gonna pay this variable cost. In addition to that, you know, you have this phenomenon where, hopefully, the warehouse is always getting better. And they're always negotiating better rates with UPS and FedEx and the post office. And, and they're opening up multiple locations, and they have insurance against business interruption, and all these sort of things that come from running a warehouse. And what I meant was that, you know, as the founder of the company, I could focus exclusively on things that added value for the business. And ultimately, I sold that to a private equity fund in 2018. And it was like, ended up being this remarkable, business remarkably profitable business, because of the focus that we had just on, on, on making great products and landing new customers. So if this is sounding familiar, it's probably because, you know, a lot of the same philosophies that come to the world of software development and serverless, and why so many of us are attracted to the, to the serverless model. So when I started Stedi, you know, sort of naive to the world of software development, I was, you know, doing my research and reading about the various cloud providers, it became very clear that AWS was far and away, you know, the, the broadest offerings and the deepest offerings for managed services. And I think naively, I just sort of assumed, well, of course, this is how software should be built, you should use as many managed services as possible, you should maintain as little code as you possibly can, and push all these things off onto a trusted vendor. And I guess I had the experience or the advantage of having worked with a trusted vendor for a very long time or a number of trusted vendors. And so the idea of lock in and and all these things didn't really worry me, worry me all that much. And so what it looks like today, you know, we use, we are 100% TypeScript, all of our infrastructure is written in CDK. And I can explain how we ended up getting there and some of our journeys through CloudFormation and in different languages, but we are 100% CDK. We use TypeScript for the front end. All of our Lambdas are written in TypeScript. We use API gateway, both V1 and V2, depending on the application. We're trying to use the V2 HTTP APIs wherever possible, just because they're there, you know, quite quite a few advantages. We use Cognito for authentication, SQS for queues, SNS for all, you know, all sorts of notifications and, and event driven things along with EventBridge in a bunch of new applications. Persistence is all with Dynamo. And we're doing some interesting things with multi-tenancy and Dynamo. Of course, Lambdas, for compute, we try and limit the number of Lambdas that were or the amount of code that we have running in Lambdas and push these things off as much as possible. S3 for storage. You know, and then all the other things like STS and Parameter Store, KMS, and, and all the little goodies in between.
Yan Cui: 10:41
That's such amazing to hear that someone who's new to software development don't have all the packages straightaway think this is how software should be done and it's natural and it makes sense where so many of us are still, I guess, we are being, we're burdened by what we used to the way we used to doing things and... but yeah, this is actually really amazing to hear that you've applied the same mindset of focusing on differentiations the things that, you know, that you can provide an add value, rather than doing things that someone in AWS can do much better job of in terms of providing that infrastructure, that foundation to your application.
Zack Kanter: 11:22
Well, it's funny, because there's, I think there's so much to learn, between software development, you know, the crossover between software development and physical product, physical manufacturing. And, of course, even a lot of the terminology that people use, lean manufacturing, you know, lean software development, lean startup is all comes out of the Toyota Production System, which is, you know, was the revolutionary system that Toyota had that allowed them to take, you know, such a tremendous lead over the US manufacturers. And there's this amazing book, called Out of the Crisis by W. Edwards Deming. And it was written, I think, in the 80s or the 90s, as sort of a roadmap or a wake-up call to US manufacturers who are just getting demolished by Japanese upstarts, if you want to call Toyota an upstart. And, and, and things that they recommend, in it are, you know, small batch sizes, which, which is, you know, the sort of DevOps movement that we're seeing today in terms of things like, you know, the idea of if it hurts do it more often, is really similar and taken from the idea of, from the Toyota Production System of changing moulds quickly if any of this stuff is sounding familiar. But in Out of the Crisis, he talks about the idea that one of the biggest ways that you can increase efficiency is to reduce the number of suppliers that you have, and increase your dependency on suppliers which is really counterintuitive. I mean, you hear today everybody's talking about, oh, should we have a multi-cloud strategy? And how should I build things in order to be portable? And in manufacturing, that really used to be the idea as well, but you know, 40 years ago, in studying these Japanese manufacturers, what they found was that when you use multiple suppliers, you ended up working to the lowest common denominator. And, you know, for example, if you have one person who specialises in, in, in high-strength steel and one person who specialises in, you know, great finishes, or something like that, you have to reduce your your quality standards, or maybe the specialisation of the products that you're making, in order to make use of both of these manufacturers efficiently and to use them interchangeably. And, you know, the the, is mapped perfectly to what you see happen when people are saying, “Okay, well, we're gonna make sure that any piece of infrastructure we use can, any piece of service that we use can be used in between GCP and Azure and AWS.” And of course, then you have to pick to only use the feature sets that are available between all three different providers. And that's you're just shooting yourself in the foot. You're also just increasing more points of contact and having more pieces of quality control that you need to do, more understanding of supplier processes versus, you know, committing 100% to a certain supplier. And that's how manufacturing works. That's how, you know it's sort of commonly accepted practice in the manufacturing space that the risk of lock-in is a lot less than, you know, the possible risk of something bad happening in the future if your vendor is trying to screw you, is a lot lower than the actual costs of having to maintain the contacts with multiple different suppliers. And and the fact that you're hamstringing yourself to not be able to use the different pieces of functionality in order to sort of remain agnostic between the providers.
Yan Cui: 15:13
It's funny that all of these things that you talked about are exactly what we're seeing happening right now in the software world, especially with the whole containerization, and the whole arguments around the portability and vendor lock-in, even though that's not the main problem that any of these companies will talk about, you know, people always talk about how they're having trouble with velocity, with being able to innovate faster, but at the same time, they're not making technology decisions that allows them to go faster by, like you said, minimise the number of providers and increasing your dependency on the providers you have, so that you can draw more value from them. It's amazing to hear that, you know, all of this has already happened before and that the lesson has already been learned. And I think we have a lot we can learn from, from manufacturing sounds like to help us make better decisions in software.
Zack Kanter: 16:04
Yeah, it's, it's funny, it's, I guess, these seem like radical leaps of faith to say, we're gonna, we're gonna hundred percent commit to AWS. But I think these are things that are commonly accepted in other areas. I think the other piece of it is that I think a lot of... let's see, in the US in manufacturing, up until the 80s or the 90s, when lean manufacturing started to take off, the mindset was sort of adversarial between suppliers and buyers. And so you assume that once you get locked into a supplier, your suppliers are gonna, you know, consistently try and raise, raise the prices, and then do as little as little work as possible. And when you look at, you know, the historical relationships between, you know, software developers or software companies and companies like Oracle, where they are, you know, this this land and expand sort of strategy where Oracle's trying to lock you in, and then and then, you know, make it difficult for you to leave and raise the prices over time, locking you in to these big contracts. It's sort of no wonder that we've been trained in the software world to believe that lockin is bad. If you contrast this to the Japanese, you know, Toyota Production System sort of model, they view a, the lines between themselves and their suppliers is quite blurred. So when they're optimising a factory, they're not looking just at their own factory, they're looking at all the factories of their suppliers. And to get lean manufacturing, the Toyota Production System, fully implemented was a multi-decade effort by Toyota, because they had to do it not just internally, but then they had to teach their suppliers how to do this as well. And in doing so, you know, what, what they found was something interesting, if you look at the way a traditional US manufacturer works, when they're working on a supplier project, or a project that they're going to send off to a supplier or subcontractor to make, they will have nine people from the buying side and one person from the supplier side. And so they will bring in one supplier engineer or sales engineer to sort of help them work through, you know, hey, how is this gonna work on the suppliers machines, but the primary focus is on the internal team. In the Toyota Production System way of doing things, you have nine people from the supplier, and then you have one representative from the buyer. And so the buyer will send someone who has some, you know, rough ideas and requirements, you know, exact requirements and rough ideas for implementation. And they will rely on the vendors expertise to tell them how to build it. And, of course, what what unites the two, between the buying and supplying side in the Toyota Production System, or lean manufacturing method is this common goal of eliminating waste wherever possible. And once they're united in this idea of eliminating waste, you know, everybody, everybody wants to reduce the waste, everybody wants to reduce the amount of communication back and forth. Everybody wants to reduce the prices, because they know that when those things happen, those costs get passed along to the customer, either in terms of lower prices or higher quality, and then the relationship will continue to grow because of it. And so, you know, you can see a lot of similarities in the way you know, someone like someone like you or someone like Stedi works with AWS, where we are going to them and saying, “Tell us the best practices of how you build software and your systems. Tell us which services we should use, which integrations and which patterns we should use.” And we will send some representatives on our side to sort of go in and learn with you and see you how we can use this to, to apply to our architecture. And so you're really putting a lot of the burden on the supplier to tell you how to do things best, because the reality is, this is their machinery, this is their world. Um, further you get to this place of, because you know that you've chosen a supplier who is committed to eliminating waste. And that's someone like like AWS, we really believe that they are customer driven, and they're looking to constantly reduce their prices. And that's not just something that we believe it's something that we've shown a long history of evidence with, with them, proving that they are committed to reducing prices. We love getting locked in because they are willing to do these things, like come up with Lambda, as opposed to EC2, you know, when EC2 is doing great already, in order to make things easier for both of us knowing that if they can reduce our costs and increase our quality, it's going to make the relationship grow, grow overall. And there's this idea from from economics called Jevons paradox, which says that the more efficiently a resource can be produced, the more of it will be consumed. So you would think that it sort of works the other way around that as we get more efficient we use less. But as you see with Lambda, you know, it's this very efficient way of producing compute. And for us, it means that we're going to use much, much more compute over time, and it becomes a win-win for both us and AWS.
Yan Cui: 21:30
And speaking of best practices, one of the things you've been building is this multi-tenant capabilities into Stedi. Can you tell us about how you approach this? Because I know you've researched this quite a bit in terms of understanding what are the known best practices and how to sort of make sure that your customers don't, they're not able to access each other's data?
Zack Kanter: 21:52
Yeah, absolutely. Um, you know, we, we were hugely influenced by a talk at AWS re:Invent 2019 called Serverless SaaS deep dive: Building serverless SaaS on AWS. And it was ARC410, for those of you who want to look it up. And it was a talk by Todd Golding, who is someone from the AWS SaaS factory, which is AWS internal organisation, sort of like an evangelist organisation for explaining how to best build serverless SaaS on, not just serverless SaaS, SaaS on AWS. And this talk sort of dived into the, into the serverless aspect of it. And up till that point, you know, we had thought that we were doing things pretty, pretty close to the, you know, best practices. And when we saw that talk, it really just opened our minds to a whole new level of what it can mean to build software in a managed services environment. And Todd talks about this idea. And I really encourage people to listen to the talks because I'm not going to do it justice. But he starts off with this idea that, that the most important concept in a SaaS product is the concept of tenancy. And, you know, picking, choosing who is the tenant in your system, and then making sure that you're making decisions in your architecture that make a multi-tenant approach a multi-tenant architecture as easy as possible. So when you look at what that means for Stedi, our tenant is an organisation. So an organisation, you know, could be, you know, the hypothetical Coca Cola and, and Walmart would each be tenants in our system. And then you also have users and you know, we are in a little bit of a complex environment because each organisation has multiple users. But a user can also be a member of multiple different organisations. So you know, the example might be if there's an accountant or a CPA who is consulting for two different companies, they would be a member of two different organisations with one single identity. And so now you've figured out this, this tenancy, there's a way of weaving tenancy into AWS that makes the whole problem of data isolation and metrics and analytics much much easier. And so the way that we do this is very closer or exactly how Todd recommends, which is we have a tenant token. So we have like an authorization system based on Cognito or built off Cognito with customer authorizers and what we do is we generate this, this tenant token whenever a user is assuming the context of an organisation, and that tenant token is included on every single call that we make. So it's, it's used for all the services that we might call on the back end. And when you start doing this, I'll use an example of how it works in DynamoDB. Things start to get pretty cool. So we use a pooled database, we're not using a separate table per customer, we have you know, for a given service, you're, you're putting, you know, for example, all the transactions into a single table. And what you do is you use the tenant ID, as, which is the organization's ID as the partition key in Dynamo. And you can set up the the, you know, IAM roles such that the user only has access, or a caller only has access to data where the where the tenant ID equals the partition key. And so I'm glossing over some of the details here. But it's basically, you know, role based access. And what that means from a data standpoint is it's very, very difficult to get access, to accidentally get access or give access to somebody else's data, because it's all in the partition key. And also, it simplifies the code that you're writing. So instead of a developer who's working on, you know, some sort of a call to Dynamo saying, “Okay, I need to think about what the what the tenancy context here is, and make sure that I'm, you know, writing these, you know, SELECT statements or whatever, in a very careful way to make sure that, that we're getting the right data.” The code can be written such that it just says, “Give me all the data in Dynamo, matching this query.” as opposed to give me all the data that belongs to me. So you're just saying, Give me all the data that's, that's in Dynamo and then Dynamo, or, you know, AWS, and IAM and Dynamo are doing the hard work of figuring out what you're supposed to have access to and handing that back to you. So that's the first cool thing that happens when you're working in this concept of, of with these, this tenant ID or tenant token. The second is something that really excites me coming from the business side of things, which is, which are the accounting implications? The accounting implications of serverless, I think, is, you know, one of the most exciting things to me about the world of serverless. If, if you're familiar with the business side of the house, you know, if you're looking at your financial results from the previous period, let's say the previous quarter, or the previous month, you have your revenue, which is the amount of money that you build your customers. But you also need to know what your cost of goods sold is. And the cost of goods sold is basically all the things that go into delivering the product to the customer. And so in a business like ours, it's really the cost of processing that transaction. So if we get a transaction from a customer, on behalf of a customer, we charge, you know, 10 cents for the first transaction or for the first 50,000 transactions in a month. And then one cent per transaction for the next 950,000. Let's say we build customer 10 cents. Now, the cost of fulfilling that transaction or all those API calls and the Dynamo calls, and, and the Cognito fees, and all these things that are related to that, and it rolls up to some fraction of a cent. Now, you're probably familiar, you know, intimately familiar with AWS Cost & Usage Report. And so you could take that CSV, and you can roll that up, and you can bring that into your accounting system. And you could say, we spent, you know, $1,000 last month on AWS fees, and we spent $10,000 last month on or we received $10,000 a month from our customers, we built our customers $10,000 a month. And so you could say that your cost of goods sold is about 10%. But what you can't do is you can't say tell me what my profitability is on a customer by customer basis. This, this is particularly hard if you're in like an EC2 or a Kubernetes world or a container world where you're trying to, you know, allocate the costs on a per tenant basis. There's... it's really all lumped together in these like always on instances and you can't really figure out what the costs are. Um, what serverless does is that first of all, it means that you're only spending money with AWS. In a properly designed system, you're only spending money with AWS every time you're invoking a transaction on behalf of your customer. But the second thing that you could do is you can tie the tenant ID or the tenant token into all of your CloudTraill or CloudWatch logs along with some basic metrics around these invocations. So you can have your Lambda emit in the log data saying, “Hey, I'm a Lambda function. I'm this size. I ran for this many seconds and processed this amount of data. And I did it on behalf of this tenant.” And we can then take those, those CloudWatch logs and we can roll those up to some sort of an aggregation service, and take those metrics, and we can multiply them times the AWS Pricing API, and we can figure out what the exact cost is on a tenant by tenant or even transaction by transaction basis. And that might sound like a whole lot of, you know, bean counting. But the idea here is that if we can get a very granular view into what a transaction costs us for a given customer, we can price lower than anybody else in the world. And that's a tremendous advantage. So that's, those are, I would say, the two main points, the ability to just dramatically simplify the code that you're writing by letting AWS do the heavy lifting in terms of, you know, IAM access for data and data partitioning. And the second is the accounting implications of it.
Yan Cui: 31:30
So on the accounting side of things, are you familiar with Simon Wardley?
Zack Kanter: 31:35
I've followed him on Twitter, but I haven't seen his discussions around accounting at all.
Yan Cui: 31:39
So Simon has often talked about this idea of FinDev, whereby finance and development can kind of work together. And the one of that, one sort of aspect of that is the fact that with serverless you have the pay-per-use so as you said, you can work out the cost for individual components, as well as the cost for individual customers or tenants in your system. And in this case you can do some micro-optimizations by identifying which component you should focus on optimising because maybe 99% of your architecture is just not worth optimising. And in the previous episode, I think it was Episode 17, when I spoke with Alexander and the Slobodan , then we also talked about this idea of FinDev and how we can actually use it to help us prioritise work and work out which bit of work is worth fixing, because maybe it's a bug, but if you look at how much impact it is having, maybe it's also, it's only costing you $10 a month as a problem. But to actually fix it, you may have to spend days of engineering time, which can easily amount to hundreds, maybe even thousands of dollars. So you can then start also taking into account the cost impact of different issues and decide how you prioritise what issues you should be working on first and foremost as well. And Simon Wardley has been one of those, I guess, the pioneers in this area of looking at serverless not only as development methodology, but also in how he enables businesses to work differently, and how he enables finance departments and accounting departments to collaborate with development teams.
Zack Kanter: 33:16
Yeah, I love that, the term FinDev. Every time I hear it, I remember how much I like it. And I think, you know, it's funny, a lot of people their, their eyes gloss over and when you talk about finance or accounting, and and similar to people outside the world of technology, when you start talking about engineering concepts that our eyes glaze over, and that phenomenon happens because people believe that it's just too complicated for them to understand, but it's like dark art, that you have to, you know, go through this special learning process in order to understand. But the reality is that, you know, 10 years ago, I didn't know anything about finance and accounting hardly and 4 years ago, I knew hardly anything about software engineering and, and in software development. But these things are very learnable. And so I would say for engineers who are interested in this idea of FinDev, finance and accounting is an overloaded, you know, it's too big of a concept. It's really that all the way we look at things is, first of all, we believe this idea of how you do one thing is how you do everything. And I think that one problem that a lot of companies in technology have is that they treat engineering problems as first class problems, and then finance problems, and you know, all these other sort of back office things as second class problems, problems that aren't as important. And so if you've if you've read that Google SRE book, they talk about this idea that that SRE is what happens when you take the concept of software engineering, the principles from software engineering and you apply it to operations. We believe that that's possible to apply to all sorts of, of business operations. And the important thing in getting this done is to marry the world of engineering and the world of finance very early. And to get back to what I was saying before, the idea is that it's not, you don't need to think about, you know, all these fancy things like the income statement, or the balance sheet, or the cash flow statement, or, or, or market caps or any of these, these finance and accounting terms, what it really comes down to is good bookkeeping. So the good bookkeeping, which is how you allocate your costs and your expenses and your revenue, everything else is built on top of that. So the reason why I bring up this idea of weaving the tenant ID into your into your logs so that you can figure out exactly what what what you're, you know, where you're spending money in AWS is that that's the most granular business look into your, your, your engineering practices are engineering system that you could possibly have. And so a lot of times, what you see companies doing is they sort of have all this stuff going on in the AWS engineering land, and then they'll package the logs up, and they'll ship them off to some, you know, BI system. And then from there, you know, some financial analysts look at it and try and make sense of it. We think that the wrong way to go about it is to take engineering data and bring it into the world of finance. We think the right way to do it is to embed these finance people into the engineering teams, and make sure that the accounting, bookkeeping data is being sort of brought in and accounted for as close to the metal as possible, so to speak. So I know, these are, you know, I'm talking in vague sort of terms here. But the basic idea is that if you look at that AWS Cost and Usage report, you know, hundreds and hundreds of thousands of lines long, and it might look like the most detailed great thing in the world. But the reality is that it doesn't give you any business information there. It doesn't tell you anything about how those costs roll up to customers. And if we're talking about AWS wishlist items, you can tag a resource such that that tag shows up in the cost and billing report. So you can say, hey, this Lambda, I'm going to give it you know, tag “foo”. And then in the usage report, I can roll things up by tag “foo”. But that doesn't tell you on a, that tells you a static tag for a Lambda, it doesn't tell you the dynamic tag that's being applied. You can't, you know, dynamically apply a tag for each tenant call or something like that, and then have that show up on the usage report. So we would love to see that. But but we think that it's, you know, probably somewhat unlikely they are going to do that anytime in the near future.
Yan Cui: 38:05
Yeah, that's probably a bit difficult for them to implement being able to tag individual invocations as well as execution for all of their services, really not just Lambda, also DynamoDB, API gateway, and so on. I have implemented similar cost allocation engines, like you have done here as well, for other customers where basically, we have done the same thing, we just use logs that we just write every time we need to access something from DynamoDB or S3, and we embed the tenant ID into the log message that we can then process them after the fact to create aggregate reports for every, every day, every month on how much cost we have incurred on serving this particular customer. Are there anything else that you would like to add to your AWS wishlist?
Zack Kanter: 38:53
You know, it's funny, with so many things like that, so many things on the AWS wishlist that we have just get done. Like they just they just seem to show up on on a regular basis. With things like this tenant based costing or tag based costing, dynamic tag tag based costing, we're really happy that AWS has implemented this. It's a, it's a tough, it's a really tough thing to build and to do right. But we like having this as an advantage over other competitors. Because we like the fact that we can understand our usage and cost patterns and our customer profitability better than anybody else. And so, you know, I actually think that it's one of the main advantages that AWS has over other cloud providers. So knowing people inside GCP and knowing people inside AWS, knowing people inside Azure, AWS is way better at their internal cost accounting. And so, you know, at the meta level, doing this sort of stuff on their side, so that they can understand their profitability by customer. They are light years ahead of someone like GCP or Azure in terms of how they do it. And I suspect it's actually because AWS, this founding story of AWS is really comes from the world of physical products, whereas retailer AWS, you know, Amazon as a retailer, probably took it for granted that they should understand what their cost of goods sold are on a granular, you know, customer by customer product by product basis. And so when they started AWS, they probably brought some of those assumptions in maybe naively, and and just did a very good job of cost accounting from the early days. And I think when you look at someone like GCP or Azure, they didn't, you know, GCP and Azure, Google and Microsoft, both print money. AWS is an, Amazon is a low margin business. Whereas, of course, AWS is not but Amazon has a low margin business, whereas Microsoft and Google are not. And so I imagined they just came with a different level of cost discipline. So those are, those are a couple of the, you know, just just high level thoughts on on when we're glad that AWS doesn't build something. And this is one of those rare cases. But in terms of other things that we want, you know, we would love to see 100% serverless ElasticSearch, you know, pay-per-use Elasticsearch is a big one. Elasticsearch is the only piece of our infrastructure that we have that is not pay-per-use billing and, and and that's painful. The other one is IAM authorization on HTTP APIs. I think we need or we've talked about needing in the past. I don't think that HTTP APIs have a throttling built in yet, like API keys with throttle, throttle keys and stuff like that. Those those were a couple of the other things that come to mind.
Yan Cui: 41:56
Yeah, that's funny I think you're the fourth person on this podcast to ask for serverless Elasticsearch. It’s one of the few things that everyone still has to run Elasticsearch while paying for EC2 uptime for pretty much everyone. And I think for the HTTP API side of things, IAM authentication and throttling, those should be coming. AWS wouldn’t commit to any timeline. But those are definitely coming because I know one of their priorities for API gateway HTTP API this year is to try to get to feature parity with REST API. So, you now, fingers crossed, those won't be too far off. Okay. And I think that's got to the end of the questions that I've got. Is there anything else that you'd like to share with the listeners, maybe a personal project or things that you are doing at Stedi?
Zack Kanter: 42:43
You know, I think the biggest thing is that we just continue to realise that you can push this, this stuff so much further than you think. And an example that you know I think we have four or five former AWS folks on our on our engineering team. And so we think we have a pretty good grasp of AWS and then you go off and you watch that talk by Todd Golding and realise you know how far people are, are pushing things out there. And you know some of our projects internally for, you know, these these more business or accounting, finance focused projects we're getting very granular cost accounting from from the early days, I think, are things that we're very excited about. But the bigger piece maybe that I would say is that we've arrived at this idea of managed services and serverless as a, as a subset of our core philosophy, which is, you know, we build for scale, we we build on using tools that are continuously improving over time. So that not just on the engineering side, that's on the product side, that's on the billing side, that's on the, you know, HR side we use best-of-breed tools wherever we can. And so we don't view, like serverless and managed services is this radical radical thing. It's just another piece of the puzzle that we have, you know, another piece of puzzle we have internally, it's just basically how we run the engineering side of things. We look at all these concepts under an umbrella principle that we call zero touch operations. And what that means is that like we're just pushing to get, to have no buttons to press, no operational, you know, toil, but that's not just internally there's also externally for our customers. So, we are building a 100% self-service platform. So that customers never have to open a support case to get some new feature, piece of functionality, or or add more users or form trading partner connections or update their credit card or any of these things. You know that's that's might sound like table stakes to the world of engineering but in the world of business tools, you know, that's that's, it's still pretty revolutionary. So we don't expect our customers to press buttons, we don't expect our customers to deal with inefficiencies and we don't expect to deal with it internally. And so I think like, you know, selfishly I would just say that to me, it's a bad sign. If using managed services inside of a company is contentious because it's not really an engineering problem with the engineering principle, it's a broader problem with how the company is run overall, or maybe not the company that's too strong a statement. How the company, what the company's principles are or goals are overall. When you look at these sorts of decisions. You know, we, we look at it like we're trying to gain the maximum amount of leverage and have people work on things that they that they that they really enjoy working on, and I think the problem maybe with all this is that when you make the decision to build on these managed services and live in AWS world it really does require a total commitment. And if you go 80% of the way there, and then, you know, you just don't have this relentless curiosity to understanding the best practices and stay on top of this stuff. I think the results are are worse than running, you know, a Rails monolith in a, you know, Docker container on ECS or something. I think like it just requires that level of commitment and. And I think that that's been the biggest challenge for us is getting finding the people who are philosophically aligned. But now we're, we're 26 people who are all philosophically aligned there. So, I know I didn't quite answer your question, but my, my personal project I would say is building an organisation that has this sort of mindset where people just want to get together and build something that's at the very forefront of what's possible. We very much believe that, that it's achievable for us to process every B2B transaction on the planet, with a total team of 150 people. And I don't mean a total engineering team of 150 people I mean a total team across all the functions. So that's what's that's what's that's what's top of mind for me.
Yan Cui: 47:27
Amazing. And I certainly hope you guys succeed in dominating that particular space. And I hope more people hear your words that, you know, see this partnership with AWS or whoever your cloud provider is as a win-win and don't see it as an adversary relationship, where, you know, is not a zero-sum game. Everyone can work together and for everyone's betterment.
Zack Kanter: 47:51
Yeah, I couldn't, I couldn't agree more.
Yan Cui: 47:53
Again, Zack, thank you so much for joining us on this podcast and sharing your stories and love the experience that you shared from manufacturing. I have no idea there's so much commonalities between our fields.
Zack Kanter: 48:04
Yeah, and I'm a huge fan of the podcast and huge fan of all your writing. Not a week goes by that one of your blog posts doesn't doesn't circulate internally. So thanks for all the, all the work you do.
Yan Cui: 48:16
Excellent. Glad I could help. And again, thank you very much and take care.
Zack Kanter: 48:21
Take care. Thanks.
Yan Cui: 48:22
Bye bye.
Yan Cui: 48:35
That's it for another episode of Real World Serverless. To access the show notes and the transcript, please go to realworldserverless.com. And I'll see you guys next time.