Saturday, April 04, 2015

Microservices and the unit of failure

I've seen and heard people fixating on the "micro" bit of "microservices". Some people believe that a microservice should be no larger than "a few lines of code" or a "few megabytes" and there has been at least one discussion about nanoservices! I don't think we should fixate on the size but rather that old Unix addage from Doug McIlroy: "write programs that do one thing and do it well". Replace "programs" with "service". It doesn't matter if that takes 100 lines of code or 1000 (or more or less).

As I've said several times, I think the principles behind microservices aren't that far removed from "traditional" SOA, but what is driving the former is a significant change in the way we develop and deploy applications, aka DevOps, or even NoOps if you're tracking Netflix and others. Hand in hand with these changes come new processes, tools, frameworks and other software components, many of which are rapidly becoming part of the microservices toolkit. In some ways it's good to see SOA evolve in this way and we need to make sure we don't forget all of the good practices that we've learnt over the years - but that's a different issue.

Anyway, chief amongst those tools is the rapid evolution of container technolgoies, such as Docker (other implementations are available, of course!) For simplicity I'll talk about Docker in the rest of this article, but if you're using something else then you should be able to do a global substitution and have the same result. Docker is great at creating stable deployment instances for pretty much anything (as long as it runs in Linux, at the moment). For instance, you can distribute your product or project as a Docker image and the user can be sure it'll work as you intended because you went to the effort to ensure that any third party dependencies were taken care of at the point you built it; so even if that version of Foobar no longer exists in the world, if you had it and needed it when you built your image then that image will run just fine.

So it should be fairly obvious why container images, such as those based on Docker, make good deployment mechanisms for (micro) services. In just the same way as technologies such as OSGi did (and still do), you can package up your service and be sure it will run first time. But if you've ever looked at a Docker image you'll know that they're not exactly small; depending upon what's in them, they can range from 100s of megabytes of gigabytes in size. Now of course if you're creating microservices and are focussing on the size of the service, then you could be worried about this. However, as I mentioned before, I don't think size is the right metric on which to base the conclusion of whether a service fits into the "microservice" category. Furthermore, you've got to realise that there's a lot more in that image than the service you created, which could in fact be only a few 100s of lines of code: you've got the entire operating system, for a start!

Finally there's one very important reason why I think that despite the size of Docker images being rather large, you should still consider them for your (micro) service deployments: they make a great unit of failure. We rarely build and deploy a single service when creating applications. Typically an application will be built from a range of services, some built by different teams. These services will have differing levels of availability and reliability. They'll also have different levels of dependency between one another. Crucially there will be groupings of services which should fail together, or at least if one of them fails the others may as well fail because they can't be useful to the application (clients or other services) until the failed service has recovered.

In previous decades, and even today, we've looked at middleware systems that would automatically deploy related services on to the same machine and, where possible, into the same process instance, such that the failure of the process or machine would fail the unit. Furthermore, if you didn't know or understand these interdependencies a priori, some implementations could dynamically track them and migrate services closer to each other and maybe even on to the same machine eventually. Now this kind of dynamism is still useful in some environments, but with containers such as Docker you can now create those units of failures from the start. If you are building multiple microservices, or using them from other groups and organisations, within your applications or composite service(s), then do some thinking about how they are related and if they should fail as a unit then pull them together into a single image.

Note I haven't said anything about state here. Where is state stored? How does it remain consistent across failures? I'm assuming statelessness at the moment, so technologies such as Kubernetes can manage the failure and recovery of immutable (Docker) images. Once you inject state then some things may change, but let's cover that at another date and time.

No comments: