Big platform shifts are traditionally architecture/management led initiatives. Microservices is a big platform shift, but the most successful organizations embrace a bottoms-up approach to adopting microservices. In this talk, Rafael will talk about the evolution of microservices at organizations, and how microservices can (and should) be adopted by organizations one developer at a time.
In the demo, Rafael uses the following open source tools:
Envoy – C++ L7 proxy and communication bus
Kubernetes – automated container deployment, scaling, and management
Telepresence – debug services locally while bridging to a remote Kubernetes cluster
Ambassador – API Gateway for microservices built on Lyft Envoy
Forge – define and run multi-container apps in Kubernetes, from source
Kelsey Evans: All right, hello everyone, and welcome to day three of the Microservices Practitioner Virtual Summit. I’m here with Rafael Schloming, who is the CTO and Chief Architect at Datawire, and he’s going to be talking about how developers can successfully adopt microservices. If you’re joining us for the first time today, just to quickly walk through some of the Zoom features that you can use, on the bottom of your Zoom screen you have a Q and A button, as well as a Chat button, so feel free to use those throughout the presentation to ask questions. We will save some time at the end to get to those. You can also post them in our summit-specific gitter channel, which the link is up in the top left of your screen. It’s gitter.im/datawire/summit. With that, I’ll hand things over to Rafael.
Rafael Schloming: Alright. Thanks, Kelsey. As you can imagine, I’ve heard a lot of microservices talks over the years, and I’ve even given a few myself. There are some things you start to hear more than a few times. Microservices offer a whole lot of great benefits, things like velocity, agility, scalability, and pretty much any good thing you can think of that starts with “ility.” But, they’re also kinda complicated. You hear this over and over again. They’re fraught with pitfalls and peril, and everyone kinda knows these things now, but what not everyone knows is that they can actually be simple. It all kinda depends on your perspective.
In today’s session, we’re distilling the essence of microservices through the lens of process, people, and technology from a developer’s standpoint. The process aspect will delve into how microservices enable a team of developers to accelerate their workflow. When it comes to people, we’ll explore the dynamics that allow multiple teams to advance rapidly and in unison. For technology, we’ll highlight the critical tools that facilitate these advantages.
Beginning with the process, it’s important to understand why this factor is so crucial. Consider the analogy of these two cars. The process that built the car on the left differed greatly from the one on the right, with the former being assembled in less than a day. If I were to focus solely on the physical components, ignoring the process, I might draw some odd conclusions. For example, I might compare the 30,000 parts of the first car to the less than 1,500 parts of the second and erroneously surmise that the key to efficiency is breaking down the project into numerous small components. But just as in selecting a vehicle, when seeking resources like a list of offshore sportsbooks by B2C, it’s not the number of options but the process of their curation that is most valuable. It’s this kind of structured assessment that we strive to emulate when building a car or choosing a sportsbook, ensuring quality and efficiency.
The car on the left was produced on a modern day assembly line with the aid of hundreds of robots, and the car on the right was produced by a very different and far more manual process. The process is why the complicated car is so much quicker to build. This is true of microservices as well, yet many people approach microservices as an architecture, and they try to refactor their model, I think, of what is ultimately a far more complex design with some disastrous consequences. This happens because they reverse the causality that’s so clear to see with these cars. With software, they think velocity comes from a more complex design, when, in fact, the more complex design is a byproduct of a more effective process. Not understanding this is one of the failure modes of microservice migrations. Velocity comes from process, not from architecture.
This is why I like to talk about microservices process as service oriented development. It reminds me that the whole point of microservices is a process that lets developers far more quickly deliver functionality to users. Let’s dig into what this means, why it works, and how to do it. In order to understand what this means, it really helps to think about the differences between architecture and development as processes. Architecture is really all about upfront thinking, understanding which early decisions will be costly to change later on, and figuring out which choices are correct. This generally requires a lot of experience because it’s a really slow feedback cycle. You can wait months or years to find out how well your choices pan out. Development, on the other hand, is much faster paced. All the tools are set up to give you really rapid feedback, and if you find yourself stuck, you generally lean on your tools, like debuggers and profilers to give you more information … which is not to say there isn’t thinking involved in development, but it’s different, less theory and more experimentation.
When I talk about service oriented architecture versus service oriented development, I’m talking about a pretty extreme shift in process, from theory and upfront thinking to experimentation and measurement. This shift is what lets you build far more complex systems and robust applications much more quickly. Why does this work? To understand this, it’s helpful to step back for a minute and think about process in general as a cycle of guess and check, that eventually converges on whatever your desired output is. You can then imagine a spectrum of activities spread out in terms of the cost of any single iteration of this cycle. At the expensive end of this spectrum, you have something like rocket science. Not surprisingly, you tend to see lots of upfront thinking. At the cheap end of the spectrum, you have something like video games, where there’s almost no cost. Over here, pure button mashing can be a pretty effective approach.
Now, it turns out that the magic of Moore’s Law has been pushing software steadily from left to right on the spectrum at a pretty astonishing pace. I remember once hearing what it was like to program with punch cards, and I was really shocked to learn you could only run your code twice a day. The entire facility had only one computer, and the programs themselves were physically quite unwieldy. They had to drive a big van around to all the offices and gather up punch cards to take to the computer operators. Programming was really a different discipline in this world. You drew out your control flow on pen and paper, and you very carefully diagrammed your data structures ahead of time, and you were forced to keep things really, really simple. This was much more like rocket science than what I think of as programming.
Of course, you fast forward to today, and not only do we expect our programs to compile in real time as we type them, we expect to be able to run comprehensive test suites at a moment’s notice and step through our control flow and inspect the contents of memory directly with our debuggers. By depending on this fast build test cycle and this great visibility [inaudible 00:07:14] environment of our program, we can build, with our punch card standards, our unimaginably complex programs really, really quickly. This is why the process behind microservices can be so effective. By leveraging the capabilities enabled by the cloud to hack the guess and check cycle for distributed systems, we can transition from service oriented architecture to service oriented development. The same distributive systems that took large team weeks or months to provision used 15 years ago, now only take a few minutes. This change in cost enables a much more data-driven and developer-led process.
The work that was previously spread across network engineers, [inaudible 00:07:59], system architects, and developers can now really all be done by a single person, yet many organizations still work the same way they did 15 years ago. How do we build a more effective process to take advantage of this? We can start with the test-driven development of traditional softwares and model and tweak our guess and check cycle from here. Let’s … Excuse me. Let’s think about some of the differences between software and software as a service and the impact on our ideal process. In both cases, we’re building something complex. We want to check our work really frequently, very incremental changes with frequent checkpoints. With traditional software, your guess and check cycle is to try and converge on one objective. Let’s call it correctness. This is exactly what traditional software tests check, that valid inputs produce valid outputs. With software as a service, you have more objectives than just correctness. You need to consider things like performance, availability, and compatibility. While we can imagine doing lots of upfront testing for performance with fancy load simulators, this gets expensive really, really quickly.
There’s really no way to do upfront testing to measure the impact that we give in change on availability and compatibility. Users always depend on things you don’t anticipate. There’s always a first time in production for any piece of code, and this is key. Our service oriented dev cycle needs to include delivering each change incrementally into projection and measuring its impact on all these extra factors, while single, simultaneously preventing unacceptably bad consequences of code running and production for the first time. Now, that’s kind of a big mouthful, so let’s write it all out and fill in the blanks. The output of our process is this continuously improving service. The guesses we make as developers are incremental improvements. The check is not just traditional tests, but it’s also measuring the impact on availability and users in production.
There are three categories of tools we need to support this. We need fast deployment to help us quickly deliver these incremental changes into production. We need good observability to let us measure the impact of the change, and we need resilience to ensure that any negative impacts are not catastrophic. There’s a lot of really fancy microservices related technology out there, but pretty much everything falls into one of these buckets. This is part of the value of understanding the process because what makes you go fast is this process, not any one tool. Adopting a little bit of technology in each of these buckets is actually way more valuable than something super fancy in just one area. Forget about the architecture, and figure out how to tune your assembly line.
Focus on release frequency, delivering incremental functionality all the way to the end users … is key, but filling in the blanks in our guess and check cycle tells us a lot more than this. It tells us users have a big impact on our process. As soon as we have users, or more precisely, users that our business cares about, you need to adjust your process to measure user impact, often with things like canary workflows. The only way to do this is with real users. This takes time, and stability versus progress becomes a fundamental trade off for any single service. This is what puts the micro in microservices. In order to cheat this trade off, we create multiple services, and this turns our application into a network of these microservices. The thing is you generally don’t hit this limit though until you start growing the number of people working on a single service, which means you have two hard problems to solve at once, how to scale your engineering organization effectively, and how to build your application as a network of services. This is where it becomes really important to understand the people factors.
Now, most people talk in fuzzy terms when it comes to the human factors in microservices, things like culture, Conway’s Law, and cross-functional team. There’s actually something a bit more rigorous going on here. From a people perspective, microservices is largely a story about how you can make teams be autonomous as possible, while still contributing to a single coherent whole. I’m gonna talk about what this means, why it works, and how to do it. Of course, there’s no such thing as complete autonomy, but in this context, I’m really talking about the autonomy to pursue a single well-defined business level objective by any technical means necessary.
What does this mean? It’s that same cheap guess and check cycle that’s driven by the Cloud that enables this greater autonomy. The labor of maintaining infrastructure used to be a limiting factor, but today you don’t need to share things like databases, and that means you can change your data models freely without breaking other teams. You can fit your data model to your particular objective, you can choose different data technologies entirely, but technical autonomy means more than just this. Centralized architecture can actually be just as much a source of technical coupling as shared data. For example, a top down mandate that one service owns all user data, or that all services calls are asynchronous, can introduce technical coupling and [inaudible 00:13:46] mismatches that slow down your teams. Autonomy needs to extend to service boundaries as well.
Now, there’s two corollaries that come along with this greater technical autonomy. First, you need to pack a much greater breadth of technical knowledge into a single team. This is why people talk about knowledge sharing, dev ops, and full stack engineers. The potential to automate so much of what was previously manual about building large scale distributed systems actually makes education a limiting factor. This is why you need cross functional teams. This is one of the most effective ways to build engineers with the breadth of knowledge necessary. With that breadth of knowledge packed into these small autonomous team, each team can move away faster.
This leads to a second corollary: lots of autonomous teams going really fast is great, but the more moving parts there are and the more connections there are between them, the less likely it is that the system as a whole will be healthy at any given point. This leaves us with a paradox. The very thing that makes us go faster as an individual team also makes it way more difficult to assemble the outputs of all of our separate teams. What we need is some way to ensure the health and integrity of our overall application without creating a bottleneck. We do this by distributing this accountability for the system as a whole across all of our individual teams. This is how microservices, when done right, helps you achieve massive organizational scale and more effectively divide work at a small scale.
Distributing accountability seems like an oxymoron, so why does this actually work? To understand this, we need to look at how health aggregates in our application. The problem areas arise whenever we have dependency chains. You can think of this almost like Christmas tree lights. Whenever any one bulb is flaky, the whole chain starts to be in trouble. In practice, there may be a myriad of ways to address these problematic dependencies. You can fix them at an architectural level by eliminating or changing the nature of the dependency to reduce coupling, or you can fix them at an operational level by ensuring foundational services remain extremely healthy. In any application, you’re gonna likely need a mix of these techniques in order to maintain the integrity of the system as a whole. This is the God’s eye view. The God’s eye view can actually get us into trouble here. When you’ve got lots of people working on lots of services, and these sorts of stability issues arise, it becomes really tempting to put some of those people in charge of fixing these sorts of problems. This can be a real trap. When you do this, you end up with centralized bottlenecks and pathologically misaligned incentives.
A centralized team that’s responsible for the health of the system as a whole can easily end up working against all of the teams that are responsible for improving the individual parts. We want to be able to distribute both this operational and architectural work, so to avoid this trap and understand how distributed accountability can actually work, it really helps to look past this God’s eye view and consider things from the perspectives of each team. Remember our process from before, each team is responsible for multiple objectives, ensuring functionality and the availability and user impact is acceptable for each change to their service. When we distribute responsibility, we’re adding another objective. Each team must ensure that their changes don’t mess up the application as a whole. Now, from our two perspectives, we end up with two problems that arise whenever this dependencies causes issues. Team C’s problem is that the health of a service is suffering. Team D’s problem is that it has no way to measure or mitigate the cascading impact of changes to D.
These two problems have two solutions in two different domains. The solution to Team C’s problem is squarely in the organizational domain. We distribute operational responsibility, give Team C a pager, and they will find out that Service C is suffering. This tells us how that God’s eye architectural work gets distributed. Service D breaks Service C enough, and Service C will be forced to change the dependency to reduce the nature of its dependency on D, in order to eliminate that coupling. The solution to Team D’s problem is squarely in the technical domain. We need tooling that gives Team D better visibility into the impact of changes on the overall application. This is one of the reasons why microservices is so tricky. You have these two problems you’re solving simultaneously, scaling an organization and building a complex distributed system.
The solution to the apparently technical problem of how to make your system robust actually lies in the organizational domain, while the solution to the apparently organizational problem of how to divide up all the work has pretty deep dependencies in the technical domain. This is why it works. Teams with excessively coupled services are motivated to reduce that coupling, in order to maintain their individual service health, and with the right tools, teams with foundational services can avoid causing the sorts of cascading failures that are problematic for the application as a whole.
How do you do this? Since taking advantage of microservices requires changing the way you work, you need to start with organizational change, rather than starting with technology. You don’t want to refactor your monolith. Your monolith embodies too much of your old way of working. You really want to start with a single team and an empty Git repo. Define a clear business objective for your first service, define who uses the service, and what it helps them do. Give your starter team full autonomy to solve this problem, and then spread best practices from there. If you do this right, it will quickly start to build many services, and you may run into system-level cross cutting issues. When this happens, be careful of that God’s eye view. You don’t want to centralize any work, either architectural or operational, that’s going to need to scale along with the number of services you have. This will create a bottleneck for your organization.
This brings us to technology. There’s a lot of technology in the microservices space. This is largely because many successful microservices companies started out with organizational process changes and then built the tools they need along the way in order to support their way of working. Then they open sourced all these tools and didn’t really provide any assembly instructions. I’m gonna dig into the technology, but I’m gonna do that from the perspective of what a developer needs to work effectively. I’m gonna illustrate this by showing you the dev workflow with a stubbed out application on a real microservices stack.
For my tech stack, I’m gonna use Kubernetes because this lets me describe arbitrary infrastructure in very simple, high level, and flexible source format, otherwise known as [inaudible 00:21:08] YAML. When I update this YAML, Kubernetes can automatically compute the difference and apply any changes to my infrastructure in seconds, if I want it. This fits the workflow I’m used to really, really well. It’s almost like I can check hardware into Git and [diff and patch it 00:21:26]. Kubernetes doesn’t operate on source code, though. It operates on containers, so I need a standard way to build all my source code. For this, I’m using Docker. The other piece of the picture I need is Envoy. I’ll be using Envoy as an L7 Router and API gateway, and you’ll also see me use Forge, which is a tool I built to help streamline my workflow, as well as the off plugin from Ambassador.
Here’s the application we’re gonna deploy and modify. This is something we build out … This actually, not built for this demo, but this is an application that we’re building out as a way to POC various ideas. In this case, it’s authentication and canary workflow, as well as function as a working example that people can use to get started with. I’m going to switch from slides to the demo now, but this demo is actually going to work on a live service, so you guys, if you want to cut and paste this URL and follow along, you can … check out and look at and follow with that URL.
Now, if you’re visiting that URL, and you see that there’s nothing there, don’t be alarmed. That’s actually intentional. You can see on this tab here, I actually … On this tab in the lower left corner, I actually have a, here, I can show you, a loop that is [curling 00:23:04] that URL, and it is not able to get anything either. That’s because unlike most demos, which start with a lot of pre-initialized and pre-provisioned resources, I’m actually starting from scratch here. All I have is source code and a Kubernetes cluster. From scratch, I’m actually going to … I’ve also assigned the DNS name so that you guys don’t need to use an IP address to talk to me.
I’m going to actually deploy this application from source code. Before I do this, let me just cover a little bit about what the application does. This is the topology of the application you’re seeing. It’s currently four services, Envoy functioning as an API gateway, delegating authentication to a custom service called [IO Custom Authentication Service 00:24:02]. Behind that, we have a tasks service and a search service. All of this are in my GitHub repo. I’ve laid this out as a mono repo, but everything you’re seeing here works fine with multiple repos or a single mono repo.
I can deploy this application and … you’ll see in a few seconds Kubernetes is actually going to roll out all of the infrastructure I need to start serving this and … yes, you can see now my loop is actually going to start coming back. If you guys are following out there and hitting this page, you’ll be able to see that this comes back with hello tasks and the up time. Actually, sorry, you won’t be able to see that because the authentication logic will not let you in. I can actually show you how easy it is to make a change. You can see here I can just change my authentication plug in to no longer require authentication. This is my authentication service. Redeploy, and … I can now get rid of my authentication header … and you’ll see even without the header, I can get into my service now. That’s an example of what I would call a stage one or early workflow.
My dev workflow here for something that doesn’t have a whole lot of users, I really want to be able to do rapid prototyping, push out changes really, really fast, and have it be that easy. For something where I do have users, I want to actually be a little bit more cautious. Let’s pretend now that you guys are all my users. This crawl script is my users, and I want to make a change in a much more careful way. I want to roll out a canary first. Let me look at my tasks … service and let’s say I want to roll out a canary release of this. Now, it just so happens … I have actually configured this envoy to split traffic, a 90%/10% between two of my deployments.
Now, these deployments have just been running the same version of the code, but because one of them is … Because 10% of the traffic is always going to my canary deployment, I can just as easily roll out this new version of my service, and you’ll see that now that is a new version. You can see on the left here that 10% of my traffic is going to this canary thing. This control I get from Envoy that allows me to split my traffic, lets me really mitigate the potential impact of any change like this on my users. In practice, this impact is gonna be much more domain-specific. I’m gonna want to build much more application-specific ways of measuring this impact. It’s really that L7 router functionality that’s giving me the basic building blocks to do this and to incrementally roll that out. Because Envoy has a lot of customizability, plugability, I can dynamically switch the traffic back and forth if I want to, but because I’m really just trying to illustrate this way of working and trying to learn this new way of working, I’ve started up with this sort of simple example.
Excuse me. To summarize, adopting microservices is really about changing the way you work to take advantage of these new capabilities. At this point, education is probably the biggest bottleneck for any organization looking to make this shift, so start with that organizational change, create a small autonomous team, and focus on release frequency, and then adopt the technology on demand to fit your workflow. This is really the front wheel that we’ve seen for success at just about any company that’s adopting microservices. This is what we recommend. Please let me know if you have any questions. The demo for the … The source code for the application I just illustrated and the tooling and workflow around that will all be available in our GitHub repo as well. Thank you.
Kelsey Evans: Okay, great. Now, we’ll take some time to answer questions. We’ve had one come in, but if you want to take a minute to type your questions into the Q and A button or the chat, we can answer them now. The first question is what would you recommend as the best way for a company to get started with microservices?
Rafael Schloming: Really, I think start out by defining an actual business level objective that’s important to the company and identifying a team to work on that and giving them the autonomy they need to do that. I think there’s some sort of technology choices that are getting easier now, building on top of Kubernetes, because of the momentum it has is pretty easy, but I think a lot of the mistakes that we see are when people try and sort of start at the technology end first. There’s this sort of analysis paralysis you can get if you start by trying to adopt all the fancy tooling that’s out there because that fancy tooling has knobs and bells and whistles that you’re not gonna need until you have 50 or 100 services. You can spend a lot of time trying to figure out how to actually use them. It really starts with changing the way you work, focusing on that release frequency, and figuring out whatever you need to do to bring that release frequency down, and then take that success with a single service and replicate that.
As that first service you start with, as that goes through different stages, you’re gonna start out in this rapid prototyping stage, and you’re just gonna be … Most of your obstacles for deployment there are just gonna be the mechanical friction around doing it and putting together all the tools and creating sort of the scripting and glue you need to match up to your own sort of rapid workflow development. As soon as you get to that stage where you have users, you’re gonna start to need to build out visibility into the impact of your [inaudible 00:32:15] users in order to maintain that rapid release frequency, and you’re gonna start to need to build out canary workflows.
If you try to build out those things too early before you actually have users, then you’re gonna end up just being slow and probably not building the right metrics you need and the right visibility you need, ’cause that visibility really is domain-specific. It’s specific to a service and its users. Then, once you do that, once you have a lot of … Once you replicate that, once you have a lot of services, you start hitting sort of your stage three or your sort of system level issues. Then, you need to start thinking about a service mesh, but one of the things is most companies that sort of went through this journey, they did it organically, and they built … A lot of the service mesh technology out there were built in a way that actually makes them retro fittable. Again, if you start with sort of the advanced tools early, instead of starting with changing your way to work, you can spend a lot of time that you don’t need to spend.
Kelsey Evans: Okay. Next question is how frequently should teams try to deploy updates to their microservices?
Rafael Schloming: I’m sorry. How-
Kelsey Evans: How frequently should teams try to deploy updates to their microservices?
Rafael Schloming: This is something where I think the more frequent the better. This is one of the counterintuitive things about those merging roles. There’s two different mindsets that are coming together with development. Your traditional sort of software development mindset is that you want to be able to anticipate, upfront, every single possible thing that might go wrong, get the 100% test coverage, get the test passing, and then you’re done. Getting too much into that mindset will cause you to build up bigger changes.
Now, if you look at things from the operations side, or from the operations perspective, then maintaining availability really doesn’t have anything to do with having perfect code or anything like that. It’s really more to do with how quickly you can recover when things go wrong, because things are guaranteed to go wrong. Machines fail. Networks fail. Of course, today, all of that’s … The machine and network failures, all of that’s automated. Kubernetes will handle that for you. If one of my pods died, Kubernetes is gonna restart it automatically. That means your biggest source of error comes from … Your biggest source of failures comes from the changes in the software that developers are making, and the person who best knows how to fix that is the developer who makes it. This is one of the reasons why sort of combining those roles, that separate developer and separate operations role into a sort of a single dev ops, or cross-functional teams, cross function [inaudible 00:35:52]. That’s why it’s so important because … the more changes you make, the less risk there is, when you have the ability to mitigate the impact of any one single change.
That’s one of the key things that these basic tools are doing. I am only spending … I’ve decided in my case, in my example here, that with my task service, I’m willing to sacrifice up to 10% of the traffic on the possibility that my canary release is totally bogus. Now, in practice, I would probably limit that to 1% or something much smaller, but that means it takes more time for me to get the confidence to turn that canary into my stable release. I would say the more frequent the better. There isn’t necessarily a fixed answer, but this is one of those things where you dramatically reduce risk if you can break big changes into smaller ones.
Kelsey Evans: Okay. Have you done any performance benchmark on Envoy?
Rafael Schloming: I have not personally, but there has been … plenty of performance benchmarks done, I believe, by Lyft, or at least … The only one I can recall offhand is I think it’s supposed to add less than a millisecond of blatancy, or around a millisecond of blatancy. Don’t quote me on that ’cause I’m just running … I’m just digging that up from memory.
Kelsey Evans: Okay. Can you explain more about what Forge was doing in the demo?
Rafael Schloming: Sure. Basically, Kubernetes runs on containers. There’s a whole bunch of mechanical steps necessary to get your source code into containers, and so one of the … This can be a challenge for using it because you generally need to reduce that friction somehow. I think without Forge, you pretty much need to do a Docker build … You need to pull your source code. Get your source code into a container, push that container into a registry, and then update the Kubernetes YAML to reference the new Docker image, and then apply that in order to actually roll out a change. This is often … You can sort of script this with a simple shell script, but there’s some benefits to using sort of a slightly fancier tool and making this sort of a little bit more streamlined.
Forge basically does those steps for you, but it also does some cashing along the way. For example, it computes the Docker image titles based on a hash of the input source, which means if you ever build the same source twice, it’s just gonna reuse the same container, and that lets you take and spin out a whole network of services much quicker ’cause in most cases, all of them are already gonna be built. It also gives you a standard way to spin off other people’s service because it’s using Docker to do the builds. It encourages sort of a single standard way to go from source to container, and this is something that can be a problem in sort of a more organically developed microservices tooling. You end up with different tool chains and different repos, and for good reasons.
There’s some things in the Python ecosystem you might want to use, some things in the Node ecosystem you might want to use, but then you have different ways … Find some things in the Java ecosystem you might want to use, and some services in Go, but then you have different ways of building all these, and if anyone wants to set up some of these services in an isolated environment or to do testing or something like that, then they need to learn how to do all these different builds. By sticking all that building behind a Docker file as sort of a standard way to produce builds and providing streamlining around that, it can make it really, really easy to stand up these applications, or these kinds of applications from source.
Kelsey Evans: Okay. For the next questions, can you come a little bit closer to the microphone? It’s getting sort of hard to hear you.
Rafael Schloming: Sorry.
Kelsey Evans: The next question is can you discuss how your point of view on the pattern or process service oriented development for microservices migrates from the traditional complex service oriented architecture?
Rafael Schloming: Yeah. Service oriented architecture … I think the biggest difference is service oriented architecture really often comes from a place of centralized architecture, the sort of top down definition of services and boundaries. This is something that you don’t see in microservices organizations. Because of that technical autonomy, that centralized architecture can turn into a bottleneck.
It’s a little bit more of … It’s kind of analogous to the way the internet functions in and of itself. We have this sort of psychological distinction between other teams in our organization and other external services. If I’m using Auth0, which actually, in my example demo here, my auth service is actually delegating authentication to Auth0, so this is an external service. Now, if I’m responsible for building something and Auth0 starts being flaky for whatever reason, I’m not gonna go and try and … or Auth0 doesn’t provide the functionality, I’m not gonna go and make whatever my objective is, whatever my deliverable is, dependent on Auth0 creating whatever functionality I depend on. I’m just gonna work around that and keep on going.
If there was a central architecture team responsible for both, you might actually have that mandate, that oh no, it’s actually, authentication is the domain of Auth0, you need to actually wait for this functionality. That can create coupling, that can slow you down, and there’s this sort of, what is it? When you’re trying to build applications at the scale of microservices applications, the architecture loses its role at that scale because a lot of the point of architecture is to keep a system comprehensible to a single mind, to keep it simple, microservices applications aren’t really comprehensible to a single mind. It’s really about a way of working that lets you build applications that are much larger than that.
Kelsey Evans: Okay. The next question is how do you feel about monitoring systems to monitor your microservice environment?
Rafael Schloming: That’s obviously a key part of the observability that you need, and that’s not something that I’ve shown here, but that’s definitely sort of a key piece of the story. Really, with monitoring, it’s the same story as with all the other technologies. You can start out really, really simple. Kubernetes has builtin capabilities that let you have per service liveness probes and health probes and sort of basic ability to get logs and some simple metrics. You can start out with these, and then you can get a whole lot fancier than that, with all sorts of distributed tracing and ways to sort of measure the latency of sort of complex, deep distributive call graphs and figure out what things are slow, sort of [inaudible 00:45:07], open tracing, and that kind of thing. There’s a lot of depth you can get into, and I would just say start out with … If you’re getting into microservices, start out with the simple stuff and add in … Start out with the simple options that are built in and get as far as you can with that, and then add in fancier stuff as you need it.
Kelsey Evans: All right. Then, it looks like this is the last question. What are the authentication mechanisms used in Envoy?
Rafael Schloming: Oh … What I’m using here is a plugin. There’s a bunch of … I believe what’s built in to Envoy, it can do [telespace.clientauth 00:45:58], or client validation, and so that’s pretty useful in creating secure network envoys. For something like Auth0, you really want to be able to insert script domain specific logic around authentication, and so for that, I’m actually using a plugin for Envoy from the Ambassador project, which makes it sort of more customizable for this API gateway role, and in that case, it’s actually for every connection, it will delegate authentication to this external service, which allows me to customize the logic, and it allows me to have that logic, that authentication logic, at the edge of my network so that none of my internal services need to worry about or rewrite any of that authentication logic.
Kelsey Evans: Great. Okay. Lots of talk about Envoy and lots of questions about Envoy. Good news is tomorrow, during our virtual summit, we will have Matt Klein presenting, who’s the author of Envoy and is going to talk about the mechanics of deploying Envoy at Lyft. If that’s something you’re interested in, definitely join us tomorrow. That will be at 1:00 p.m. Eastern. With that, we’ll sign off for the day and see you all tomorrow. Thank you, Rafael.
Rafael Schloming: Thank you.
Try the open source Datawire Blackbird deployment project.