Peter Marklund's Home

Fri May 28 2021 10:54:52 GMT+0000 (Coordinated Universal Time)

The Maintenance Burden of Microservices

Autonomous teams is a popular idea in the industry today that tends to be associated with microservices. Autonomous teams fully own their services and are trusted to make their own decisions. The idea is that this will make them more motivated and productive. I fully support this idea. There are many other factors that drive microservices adoption such as scalability, performance, fault tolerance and the architecture of cloud infrastructure (i.e. the emergence of serverless functions etc.). However, multiple challenges arise when a team goes from owning a handful of services to owning dozens. Unfortunately I believe that when a team splits up its services on a micro scale it is setting itself up for a heavy maintenance burden.

When creating a new microservice we as developers tend to feel really good about ourselves and we can be quite productive. We enjoy how seemingly isolated and modular microservices are and we enjoy doing greenfield development. If we are lucky we may even get to use our favorite language, framework, coding conventions, or infrastructure. We feel free as we are seemingly no longer bound by legacy systems. Let's imagine a difference scenario down the road though where we find ourselves having to do maintenance of a large number of microservices that have accumulated over several years and generations of developers with different preferences - many of which have left. This scenario is obviously not quite as attractive...

Here are some challenges with microservices:

Duplication and boilerplate. All the code that doe not constitute the essential business logic of your services will tend to get duplicated each time you create a new microservice. One can take steps to reduce this boilerplate of course but in practice there tends to be a lot of it and this problem tends to grow over time.
Weak integrity. When working within a single app we are assisted by the language, the IDE, the compiler, and the tests to ensure the integrity of function calls across modules. When sending messages between microservices over the network (i.e. with REST API calls or message queues) maintaining that integrity is much more difficult. Also, any changes to a REST API or a message needs to be backwards compatible. This puts a significant constraint on our ability to change and refactor the system over time.
Difficult debugging. You typically don't have a stacktrace when debugging errors in microservices environment and this obviously makes debugging more challenging.
No system-level regression testing. Microservices lead to lower end-to-end system level test coverage and you can therefore not be as confident that the system as a whole continues to work when you make changes. Keeping the functionality of the entire system intact is obviously what matters to end users and thus to your business. Developers will typically test microservice in isolation (and possibly in a shared staging environment) and then ship them to production and rely on monitoring to ensure that they will work well. The ability to spin up an entire environment every time you make a change is typically not there. Unfortunately, the independence of microservices is often an illusion as microservices tend to depend on each other in many unspecified and undocumented ways. For example, to scale one service you may need to scale its dependencies and one microservice failing can cause other services to fail (unless good circuit breaking is in place) etc. Also it’s quite common to see microservices being integrate via a shared database or file storage even though this is supposed to be strict no-no.
Learning curve. As a developer you may struggle to see the full picture of the system as it's not expressed in code. If you are luckky you may have access to a few architecture diagrams but they can typically not be relied on to be comprehensive and up-to-date. The fact that services get built with different languages, frameworks, different versions of libraries, different directory structures etc can be a blessing but also a curse. It leads to duplication and a much greater learning curve and cognitive load for developers.

Let's review the action points and decisions involved in creating a new service:

Choice of programming language, frameworks and libraries and their versions and how to keep those up-to-date
Coding conventions, folder structure and naming conventions
How to do testing, linting, and building
How to do configuration (where to store secrets, env variables, config files etc.)
How to do logging
How to setup deployment and infrastructure (including choice of cloud provider etc.)
Setting up a build pipeline (including potential choice of CI provider, i.e. CircleCI or Github Actions etc.)
How to do monitoring
How to do documentation (README files etc.)

Now suppose you want to make a change in any of the areas listed above. Let's say you want to change cloud provider, or build pipeline, or the framework that you use, or upgrade the version of your programming language. Instead of being able to make this change with one or a handful of PRs you end up needing to make dozens of PRs. Or maybe you don’t have time to create all those PRs and then your services grow more inconsistent over time and this is the most likely scenario.

Let's think about everything that is not core business logic in a service. In other words let's think about the incidental complexity, the implementation details, and the boilerplate. To make things concrete I will use a Node.js API hosted on Github and deployed with Docker on AWS as my example. Here is an incomplete list of boilerplate for such an API:

package.json - scripts for testing and running the server and all libraries and frameworks and their versions
.npmrc - configuration for the package manager
nvmrc - node version
.env - env variables
config - configuration files
Dockerfile - Docker configuration
.gitignore - git config
.dockerignore - Docker configuration
.circleci - build pipeline
.eslintrc - linting
jest.config.js - test config
cdk - infrastructure/deployment code or config (can be thousands of lines)
server.js, routes.js - Code to start a web server and do routing etc.
db.js - code to talk to the database
swagger.json - API documentation

What is the ratio of essential business logic code to boilerplate in a microservice? Well that varies but it's not uncommon for there to be at least as much boilerplate as business logic.

As a small development team I think the goal should be to only be maintaining a handful of services. Sometimes architecture will force us into having more services or the team may have too much on its plate but we should try to avoid that if we can.

In this post I've mostly talked about the duplication problem but another important aspect is that microservices comes with distributed computing and a fundamental increase in complexity. Ironically this may actually be one reason why developers are attracted to this architecture. After all, developers tend to be drawn to complexity and the challenge of solving hard problems.