There are several reasons, why you would want to distributed your database.
- Keep your data closer to your customers. Lower network latency.
- Compliance with policies such as GDPR. Some countries have restrictions around movement and storage of data outside their geographic boundaries.
- Higher availability and fault tolerance. Such that an incident in one geographical location does not impact customers residing in other part of the world.
Distributing a stateless application is easy, but distribution of your state (data layer) is challenging. There are several ways you can approach distributing your database. Each way has its own pros and cons and understanding them is critical. In this article lets look at some ways you can Distribute your database across several geographical location.
Lets take a Pizza delivery Service as an example.
Approach 1: Scaling and Distributing with read replicas
- In this approach you have a single primary database and several read replicas.
- The primary database is responsible for serving all the writes (globally). As you can see in the diagram above, all the “place order” requests are still going to the primary database.
- The read replicas are globally distributed and each replica serves “reads” for a specific region. In the above ex. DE, Indian and UK are different regions for example.
- These read replicas are simple to spin up and can significantly improve the read latency, because they are closer to the customer in a specific region.
However there are some limitations and challenges:
- Writes are still slow and are served by a single node. So if this node goes down, your customer can’t place any order.
- Read replicas are kept in sync with the primary using async replication strategies. Async means they will have some lag.
- For ex. your customer from India places and order. This order is stored on the primary node. However the read replica in India does not have the information yet, because replication is slower. Now if the customer wants to look at their order, it is possible that they might NOT see the order, which is a terrible UX.
- Stale reads is a huge problem for many applications, so we need to be careful.
- Your orders are still going outside the country, which might have some compliance complications.
Advantages of this approach:
- Easy setup, and easy to spin up new replicas.
- Easy to understand
- Works nicely for read heavy systems where strong read consistency isn’t a critical requirement.
Approach 2: Deploying separate database per region
This approach is pretty straightforward, you just deploy a new database (shard) which serves a single region.
So as you can see in the above diagram, each region India, Germany and UK has their own database instance. Customers from respective regions are served from the local (to the country) database.
So when a customer creates an order, you route their request to a region specific app server which talks to a region specific database and everything is cleanly separated.
This is a simple model to understand and gives you an ability to scale each database (vertically, or with read replicas) separately as per needs. For ex: if your Pizza is more popular in India you can vertically scale the Indian database and other databases need no changes.
It is simple approach, but comes with some limitations and problems:
- This is costly and takes heavy lifting whenever you want to launch your app in a new region.
- You no longer have a single database for your entire application, which means you cannot query them all together for running some analytical queries. You need a different solution that gives you federated access to all the databases.
Approach 3: Using Natively Distributed Database (Cloud)
There are databases solutions available that support Geo-Distribution natively. So it makes a lot of sense to leverage their capabilities for a Geo-Distributed application. You just need to model your data keeping in mind the distribution of data. So as you can see in the above diagram, you have your Pizza orders table.
- Each row (order) has a region column (it can be anything like country code, location, country name etc)
- Each region maps to a database partition which is deployed in that region.
- So orders from India are routed to the Database partition deployed in India. DE goes to DE partition and similarly UK goes to the UK partition.
- This is still a single database with partitions serving reads and writes for a specific region.
How is it different?
- Because the database support distribution natively YOU as an application developer OR infra person don’t need to worry about the distribution, deployment, query latency, unified access to all regions and so on.
- The heavy lifting is done by the database and you can focus on your area of expertise which is innovating in Pizzas and giving your customers an amazing experience.
There are many Distributed SQL databases available in the market, but the leading ones are Google Spanner, Yugabyte DB and CockroachDB.
Here is a sneak peak of how your Distributed Application would look like:
To know more about Designing Geo Distributed applications you can head over to the podcast I did with Denis Magda from Yugabyte, where we have discussed everything you need to know about the various strategies.
I hope you enjoyed learning the article. Stay tuned! Subscribe to The GeekNarrator youtube channel.