Scaling applications : The Scale Cube

Scalability is one of the most important factors which decides the success of any application. Scaling an application is non trivial, hence it becomes more critical. Scaling can become a nightmare if not planned well. Keeping in mind some of the best practices, architectural patterns and application design concepts can make it easier and straightforward.

Scale Cube defines three ways of scaling applications :

  • Redundancy (x axis)
  • Micro-services way (y axis)
  • Data Partitioning (z axis)

Redundancy :  This means you deploy multiple instances of your application behind a load balancer. Each instances run exactly the same copy of your application just on a different machine. The load balancer decides which instance would serve the request based on load balancing strategy.

This is absolutely fine if your application’s size is small/manageable. But if your application is a huge monster having hundreds of modules packaged inside a single monolith, it soon becomes a disaster. Why ? Lets see :

  1. Since you have a monster application with huge number of lines of code, making changes or implementing a new feature is not easy. A single bug can impact the whole application and as you have redundant copies of the same application, it becomes an availability issue for your overall system.
  2. Overall size of the application impacts the startup time of the application. Which increases developer’s wait time to test changes and hence reduces productivity.
  3. With the growing size, it becomes very complicated to understand the whole system. Even making small changes or fixing bugs become a complicated task involving testing the whole application.
  4. Technology upgrades Or switching of technologies is very hard. Also it might not be suitable for each and every module of your application. For example, some modules can be written better in Go than in Java and other modules are best written in Java.
  5. Choice of hardware can be difficult and cannot be optimised. For example for one module which is highly compute oriented, you would like to use a compute optimised EC2 instance and for another one which is highly memory intensive you might want to use memory optimised instance. In this case you have to compromise with the performance of one module over the other, just because you have it inside one big monolith.
  6. Reliability is another big issue in Monolith apps, for example a memory leak in one module can kill the whole process and hence all the modules.


Micro-services :  Instead of having redundant copies of the same application, we can also split the application into smaller (micro-sized) interconnected services. These services focus on one small chunk and communicate with another services using APIs. These can be split based on the distinct features like customer management, order management or payments. Micro-services architecture solves all the above problems that we discussed above (as part of the X-Axis). However, it introduces a lot of complexities like service to service communication over the network, service discovery, handling partial failures, latencies etc.

For huge applications with lots of scale, it is usually better to use Micro-services architecture, but it also depends on the use case.


Data Partitioning : This belongs to the Z-Axis of the scale cube. In this approach you don’t split your application (based on the distinct feature), but you split the traffic based on the request. Which means that some attribute of the request will be used to identify the server which will serve the request. For example : Based on the customerId, we can using some hashing algorithm to generate a hash which corresponds to a particular instance. Its like a hash table where each hashkey maps to an instance and the keys are identified based on the request attribute.

This basically means that all the instances run the same code, but they handle different subsets of data. For example any request for a customerId lying in the first bucket will always be served by the first instance, any request for a customerId lying in the second bucket will be served by the second instance and so on. Splitting the traffic based on the request attribute can be a quick way to scale your application as compared to breaking your monolith into micro-services.

The traffic management job can be done by a separate component, which keeps track of the number of instances and forwards incoming request based on the Hash table entries.

Which of the above approaches are you using ? I have experienced all three approaches and based on my experience Micro-services works best when you want to follow latest software development practices, use state of the art technologies and maintain world class software at scale.






Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s