How to get burned by moving to a microservice architecture.
Microservice architecture has, finally, been made easy with technologies like kubernetes, docker, and wsgi.
Larger companies have been building out distributed computing architectures for 30 years. DCOM, EJB, CORBA and the like made inroads into corporate deployments, but tend to be language or OS-specific and are restrictive in ways that slow adoption.
But recently, I’ve seen even the smallest SMB shops making extensive use of coordinating microservices.
For companies, this is a game changer in terms of scaling productivity. Yes, having simple, well-defined interfaces allows separate teams to build out services in parallel.
But this is exactly where things get dicey. I spoke to a developer who recently built an app in flask, spun up a kubernetes cluster, and after launching 40 instances, was able to keep up with the workload of 8000 connections per second. While it was easy to scale, I had to wonder … what was being processed, and why 40 instances?
The answer was: poor design, misunderstanding of the tech (flask shouldn’t be used as a web server in production), and way too much communication to compute ratio (tons of small requests to a database needed to be replaced with once well designed and optimized query that fanned out the results to workers).
And this is where microservices are going today. Companies are so eager to get on the microservice bandwagon that *everything*, even the old reliable batch jobs, are transitioning. And often without a complete understanding of the efficiencies produced by traditional client-server computing, batch computing and other more traditional techniques.
At one very large financial company that I recently wrote some software for, there were hundreds of teams producing apps for customers. Each team had dozens of microservices. Yes, thousands of these little daemons were running and you could scroll through once central control panel to look at them all (nice!).
While walking randomly through the services, I found this:
- 25% of the services were, essentially, a “cache” of some slower database. Of those 80% were hitting the *exact* same database. Sadly, they could be trivially replaced with a single cache. Worse, there were 5 teams who had worked on producing that very thing.
- 40% of the services processed data received “at market open”, usually from emails or ftp files… which were then pushed into the microservice soup (sometimes one line at a time, sometimes a whole file) … only to emerge later as a data file that got emailed anyway. Yes, a small percentage of these some of them had meaningfully complex internal processing system that benefits from this architecture. But 90% could have been trivially replaced with a single program that runs on data receipt. Most them just pushed the data to a service for which they were the only client…waited for the service to reply, and then wrote the data back out again… one file at a time!
- About 20% were bizarrely idiosyncratic. A few had odd dependencies on specific CPU types. Many relied on the presence of magical configuration or (worse) generated source files … with no documentation as to how they were produced. In one case the service was actually missing the original source code at all.
- About 15% “did the right thing” and had well defined interfaces, distinct business rules, and weren’t “tightly coupled” to dozens of other services.
This can happen to anyone. I’ve worked at a large CRO, and consulted with dozens of financial institutions, and talked shop with programmers from FANG companies.
Stuff that doesn’t work:
- Too often the overhead of the service architecture is simply not serving a business need, and while the *organization* of the services (consistent documentation, interface and protocol) was beneficial, this could have been adequately supplied by a simpler set of batch processing libraries, standard interfaces and a damn good job scheduler like Dkron (for example).
- Microservices proliferate *fast*. Unless the company has a great interface for job scheduling and documentation, developers will opt to “safely” build out brand new microservices …even when a minor modification to an existing service would be warranted.
Stuff that works:
- Consistent Schema and Protocol: Pick a handful of protocol and schemae definitions you’re going to use *everywhere*. Make a library for it and enforce it rigidly. Websockets + protobof? Fine. REST + JsonSchema? No problem. And you could even use both (REST for client facing stuff, for example, and Proto/WS for everything else).
- Consistent Interface Documentation: All inputs and outputs to the system are documented (not just I/O, env vars, tokens…everything), published, searchable and checked against the schema. Even if you’re a Python shop (for now) starting with Doxygen is probably a worthwhile effort. No matter how mall you are, you will, eventually, start coding in other languages,and Doxygen basically supports all of them. Enforcement and consistency are *key* here.
- Follow OO Best Practices: Each microservice is, essentially, an “object”. Now, when you design a C++ lib, and you notice the same thing being done in 30 places in the code, what do you do? You make a function for it. It’s no different with microservices. Fight bloat by finding common themes in services, and unifying them under libraries or, if appropriate, unified services. Don’t let the “micro” buzz word take over your whole organization.