Real World Serverless with theburningmonk

#2: The case for monorepoes with Joe Emison

Yan Cui Season 1 Episode 2

This is part 1 of my conversation with Joe Emison, who has been building serverless applications since 2015 with Firebase on GCP and then later with AWS Lambda. We discussed the value of serverless, especially to startups, and why he has built Branch Insurance to be fully serverless. Joe is a strong proponent for the monorepo approach, and we had a good chat about why he feels so strongly about this approach and how his team structures their application in a monorepo. And of course, we also covered the pain points of this approach and the tradeoffs his team has accepted.

In part 2, we will talk about how serverless has changed his recruitment philosophy, the pain points with adopting serverless and why Microsoft will be the main challenger to AWS in the cloud computing market.

You can find Joe on Twitter as @JoeEmison and check out his new venture gobranch.com.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday/
License: http://creativecommons.org/licenses/by/4.0/

Yan Cui:   0:13
Hi, welcome back to Real World Serverless, a podcast where I speak to real world practitioners of Serverless and get the stories from the trenches. Today, I'm going to be speaking to Joe Emison who is the CTO of Branch, and who has had a long history of Serverless technologies. So welcome to the show, Joe.

Joe Emison:   0:32
I've been developing Serverless applications since about 2015, starting with Firebase, and done Serverless development on both the Google cloud and Amazon. I played around a bit on Microsoft's cloud, and the real driver in going to Serverless for me was knowing how much time I and my teams were spending on infrastructure, and also the ways in which infrastructure problems continually disrupted our ability to deploy and develop software, how developers get blocked on infrastructure problems, and generally how development gets stuck when real world, even in the cloud, infrastructure problems come in place. So I oversaw teams building applications in commercial real estate on Firebase, and now full insurance backend and front end for home and auto insurance company in the United States. I have found that Serverless does give me all of the benefits that I would have expected, which is I need fewer people working on operations. We spend much less time on operations, and software development velocity is faster and much more predictable.

Yan Cui:   1:54
Yeah, that's really interesting. What you focused on there in terms of how much overhead there is with infrastructure. I remember I listened to a podcast with, I think it was Corey Quinn a while back with Matt Klein, who's the creator of the Envoy proxy, and one of the quotes I got from that podcast was that Matt said that unless you are an infrastructure company, infrastructure is basically overhead. So how much time do we spend on overhead to doing our business versus actually building the things that's going to help us differentiate our products? It's just ridiculous, having been doing this for so many years myself, I totally understand all the problems that can come with having infrastructure and being responsible for them.

Joe Emison:   2:34
Yeah, and if you're a vice president of engineering, you can walk in in the morning and know all of the important feature work that you're working on, know all of the additional work you're trying to do to make software development faster, better, and easier for your teams. But you can walk into every day another minor fire or crisis and find spending the entire day just trying to help what's going wrong in production, even though in theory, the goal had always been, I think, software developers develop, and then it goes into production, and it just works. But the iteration of things over time, with that attitude, led to operations making more and more rules, making it harder and harder to deploy software. So I think as an industry we realized we needed to do something about it. So we ended up with more development ownership of code, and more development ownership of infrastructure and uptime. But the problem is, as that's happened, more and more of these issues that are not really core issues to software development keep seeping back in to development. As we drive more to infrastructure as code, it means developers have to spend more and more time on infrastructure and on uptime. So if you decide as an organization to take more and more of the infrastructure on yourself, even though it is cloud, but if you're taking Kubernetes on yourself, if you're taking VM baking on yourself, then all of those things are going to seep back into the entire development organization. All of them are going to block people as you run into problems with them. So the Serverless mindset, I guess, as Ben Kehoe calls it is one where you're constantly seeking to offload those things that aren't differentiated for you, and you're offloading them onto vendors who handle more and more of those capabilities for you. So yeah, I completely agree that infrastructure is overhead, and really, anything that doesn't directly contribute to the unique value your organization is delivering, that's all overhead. To make matters worse, as an industry, we're not really good at thinking about what we should be letting go of. So anything you build in your organization, there tends to be a sense of pride over it. It ends up being hard to throw it away, even if there's no value in you having built it or run it yourself.

Yan Cui:   5:08
Yeah, that's so true. I've had so many engineering teams spend a very long time building infrastructure. At the end of it, you feel really clever, really smart, and as you said, really proud of the work, but you just spend weeks and weeks of doing something that is not going to directly contribute towards the product, making it better or adding any business value, and yet it's something that you end up having to carry for a very long time because now that's part of your ownership for your architecture for your application.

Joe Emison:   5:35
Yeah, and one interesting thing that happened at Branch, this is something that we live with every day. This question of should we be building it? One of the things you have to do in the United States as an insurance company is you have to generate forms. Essentially, an insurance is you pay a certain for a contract, and that contract will pay you out if certain things happen. There are lots of regulations in the United States where you provide to every state all of the individual forums you're going to generate for people. Initially, the prices and the options that I saw for companies who would manage that for you, I didn't like any of them. They were very expensive. They seemed very clunky. So we developed that ourselves initially, and this was in the last two years. We were developing how we were generating these forms, how we were going to handle versioning them, because when you file a new version of the form with a state, you have to then have a different version for all the policies issued after that date. But the policies issued before that date still need the prior forms. There are many, many, many different forms you have to generate. They have to look exactly like how you filed them. We got down that path, many, many hours down that path. Eventually, we were spending so much time on QA-ing whether we had exactly put this footer in the exact right font and the exact right place that I went back to those vendors and discussed with them again and picked a solution that we're totally fine with it. It's not my ideal way to solve the problem, but it was the right decision, because we're not differentiating in how we're versioning and generating documents. That's a crazy thing. Our customers wouldn't say, I'm picking an insurance company based upon how well it generates these form documents itself. So I think this is a constant struggle, and if I think if any organization isn't constantly asking, what are the things that I've built that I can throw away, because I can pay someone else to handle the maintenance and support of them, I think that you're almost certainly keeping way too many things in-house. I think you have to be really relentless about this, and it's painful. But if you're not doing it, you're probably suffering from the consequences of not doing it.

Yan Cui:   7:56
Yeah, that is also especially true for a startup, your survival depends on how quickly you're able to iterate on ideas and differentiate yourself from the establishment. I think a lot of time the questions of build versus buy are moving towards build when you are a very large company, you got such a high volume that it becomes imperative to optimize for cost. That's when I think that equation start to lean towards building something yourself and owning it, rather than, especially when you're trying a new market, a new idea, and why would you spend half a year building something where you can just get something off the shelf that can do the job for you so you can actually test the idea to see whether or not there's any legs to it before you invest all this time and energy on something else?

Joe Emison:   8:41
Absolutely. That's absolutely right.

Yan Cui:   8:43
You're also a very strong proponent for monorepoes. A common question I get is, how do you do CICD with monorepoes? I know you've got some really interesting approach to that. Can you tell us about your approach to doing monorepoes?

Joe Emison:   8:58
Sure. I really strongly believe that you're best off, at least in the early days, or with small team size, of putting everything in one repository. I think it solves lots of problems. My strong belief in monorepoes for small teams and for startups is very tied to my belief that deployments should be monolithic for those same small teams and startups. I think monolithic deployments are really important, because they limit the surface of what you need to debug, and they simplify deployment for everyone. They also will give you great parody between different environments. So whenever you're wondering why isn't something working, if you're always forcing a monolithic deployment of everything that you have, and it's very easy to know this is what the state of all of the code was at that particular time, it becomes a lot easier to identify problems, debug problems, and have everybody working within the same world space, and this is something that I can see a night and day difference. I think if an organization switches from having multiple repositories that they're deploying separately to having every single piece of code that is part of the platform or system gets deployed at the same time and sits in the same repository, I think you immediately see close to the end of, well, it worked in my environment, or it works locally here. I also think you see the end of, well, I'm not sure how to mitigate this problem that we are finding right now in production. I'm a little worried about do we to deploy everything out of it's own repository? Do I have everything at the right version there? What is the right tag for this particular service? All of those problems go away when you have one repository and you deploy it monolithically.

Yan Cui:   11:10
So do you also find that it becomes easier to share code because everything is deployed monolithically and you can reference shared code via simlinks, or via your path, rather than having to go through this cycle of go to the repo, work on your share code, change it, PR, publish package, and then you come back and update all services and deploy them individually?

Joe Emison:   11:32
Exactly. No, that's the huge benefit, right? Yes. So you have the shared code. Everything's using it. We use lerna. This is all node JS. But absolutely, if you think about it this way, we have a lot of different functions, and we have front end code, and they're using the same helper functions. They're using the same data structures. If we want to add, let's say, we're adding some field, it's really common for us when we launch a new state to need some new piece of information. For example, Texas needs to know if you have a certain types of storm shutters on your house, that'll give you a discount. So the goal is, and we're relatively close to this, you just add storm shutters in one place. Ideally your GraphQL schema, and then you would just go to the place where the discounts are applied and say, if that's there, calculate the discount. Then you would go to the front end, all of the interactive pieces, and you would say, we're going to need you to be able to show and input storm shutters. And that's it. That is largely how it works. Real world code, it can be slightly messier. But for us, that's what happens. We go, we add it in, we add it to the schema, we add it in the places that need to see it, and that's it. You're exactly correct, that if these things were separated out, not only would we have to open each one of them up and add it, we'd also have to handle the alignment of versioning between them, and we would, no doubt, in development and in the real world run into lots of these incompatibilities where we hit errors because we didn't do one of those things exactly correctly.

Yan Cui:   13:14
For the audience that are not familiar with what exactly you mean by deploying monolithically, I guess also maybe the question was how do you structure your repos? Do you have one folder in the root for every service? In terms of your pipeline, is that one pipeline that deploys everything in parallel, or do you do them sequentially?

Joe Emison:   13:34
Sure. Yes, so we have one directory per, you could call it service. One of the great things about monorepoes is that you don't definitionally really have to define what a service is. We use AWS AppSync, and we have react front ends. So every directory generally ends with either -BE or -FE at the end of the root name. For example, our react front end where our customers log in and get access to all of their insurance information is account -FE. So that's a directory off the root, and it has a react app within it. Then, for example, we have an AppSync API that drives our customer-facing interactions, so both purchasing as well as that account front end, and that's called AppSync BE. So within that AppSync BE directory, there's all the configuration for that AWS AppSync GraphQL API service, and all of the functions that are behind that service that AppSync helps drive. But we also have a directory called Packages. That is this shared directory of new jazz that just get included by both the front end applications and the backend functions. So all of that is set up in that way. Then there's a Deploy directory that has some code around deployment to be able to deploy. When we send everything to CICD we split pipelines. So every single directory essentially gets built separately, and there are some dependencies on each other. We use CircleCI for CI, so it's a relatively simple configuration to say build this. And then for this other thing, wait for this first thing to have built. Then we run automated test suites. We use a service called ProdPerfect that develops our test suites for us. So we wait for everything to build and we run the test suites. In order to do all this simultaneously we have 10 staging environments. When CircleCI kicks off, the main workflow to do all the building, deploying, and running the test suites, it picks one of those 10 environments, and then it goes and it runs, and I think we probably have 14 different directories, or you call them services, but that includes front end applications. We have about 14 of those that build and get deployed. Then we have two test suites that run simultaneously right now on those. That all goes back into the pull request. So we obviously will iterate with code review and with those results, and then when it's ready to go, that same process is run to do the deploy in Prod.

Yan Cui:   16:23
Okay. I guess what would you say are some of the drawbacks, because listening to how you describe your process, it all sounds great, but one thing that pops to mind is potentially your pipeline is going to take as long as the slowest component that you're trying to deploy. Is that something that ever come across as an issue for your team?

Joe Emison:   16:42
You know, it's annoying, and especially now. So we recently switched to having our core insurance purchasing application be server-side rendered. The best way to do that in the Amazon environment, at least as far as our research went, was to use lambda@edge functions with next JS. What this means is that everything's in CloudFront. If anybody who's been working with CloudFront knows, a CloudFront deployment takes at least 10 or 15 minutes every time you do that. So that said, we do simultaneously deploy, but you're correct. The end to end, the deployment pipeline is as slow as a CloudFront deploy each time, which is, I think we're probably averaging 22, 24 minutes on that. Then on top of that is the test suite, and the test suites take about 15 minutes to run. So we're talking about 37 minutes to go end-to-end. For us, that not an enormous trade off. I would certainly like it to be faster, but my general view is that if it's going to take more than about a minute, it's okay if it takes less than an hour. One of the reasons why we have these 10 different staging environments is so that we can have a lot of this simultaneously going on. Also, keep in mind that every developer at Branch has his or her own Amazon account, which is its own isolated environment to deploy into. We also have other environments that we deploy into for various QA, or staging, or testing, or development. So the challenge on the pipeline taking 30 to 40 minutes is that if you were in a setting where either you needed a Prod deploy out quickly because there was some problem in production, that's the primary one that hurts us. There is another one that I talked to a lot of developers about who really want to develop something and have it reviewed and done and pushed live as soon as they're done with it. I don't actually agree with that as the right way to do development. I think the best code and the most maintainable code has mandatory code review, and once a developer is written it, I don't have any problem with it taking another day or so of reviewing and testing just to make sure that it's what it should be as code. I just don't know how you get a good code base unless you have those practices. Maybe there are companies of superstar amazing developers where everything they write is amazing. But if you look at how journalism works, people write articles, and then they're edited. There's a saying that I certainly believe which is, there's no thinking but in writing, and no writing but in rewriting. Again, I just don't know how you get quality code without writing and rewriting it and thinking about it. So having as part of the code review process, a 30 to 40 minute build and test pipeline just doesn't seem to me to be much of an issue there. There is a drawback in getting things to Prod when we get ready to get them to Prod, but we don't have that many rollbacks. We don't have that much of a need to do emergency deploys in a Prod, at least at this point. So that part of it hasn't been particularly painful for us.

Yan Cui:   20:08
I guess a lot of the 30 to 40 minutes is tied to how long CloudFront takes to update, and I'm sure CloudFront is going to try to catch up with the CloudFlare workers, which can deploy an update within a few seconds. So hopefully, when that happens a lot of your time would just magically go away because the platform is involved.

Joe Emison:   20:28
Yeah, the longest other service is the AppSync back end, which takes about 12 minutes. So yeah, if that went down we would fully cut 10 minutes off the whole length.

Yan Cui:   20:43
I also want to touch on something that you just mentioned there, that unless you've got all these superstar developers, code is probably not going to be perfect the first time they just finished writing it. One of your really interesting philosophies around recruitment, which I think is very different from pretty much everyone I've spoken with, is that you primarily hire junior front end developers. So what is your rationale behind this approach, and how well has it worked for you so far?

Joe Emison:   21:10
Well, rationale is several reasons. I think the primary philosophical reason for this is that I've been working with software developers for about 25 years now, and certainly over the last 15 I have noticed that when you hire senior developers, or any developer you hire as an experienced developer in the tools that you're using, you're getting experience. But the other thing you're getting is, generally speaking, a belief that important decisions about application architecture and important decisions that will really affect the future of that application need to be in the domain of those senior hires. But if you combine that with two realities today, one of which is that we have no good continuing education practice in software development. There's simply nothing out there that will make it simple and easy for anyone to understand what the best way to do things is today versus five years ago, or even 18 months ago. If you combine it with, and this is certainly true in America and certainly true in larger cities, but a real arrogance in software developers. It's not all software developers, but there is a real brogrammer culture in software development that has, I think, it's epitomized by the stereotypical response on Hacker News, which is just brimming with self-confidence, arrogance, and generally derision for whatever thing they're attacking. There aren't a lot of supportive, wow, that's really interesting. I hadn't thought of that before. That goes against everything that I've done to date, but maybe I should look at it. That's never a response, you see. But that's how you learn, right. That's what continuing education would do.

Yan Cui:   23:53
So this is part one of my conversation with Joe Emison. Please come back next week for part two of this conversation. If you want to access show notes or the transcript for this episode, please go to realworldserverless.com. I will see you guys next time.