Let’s design a datastore for the products of an e-commerce website. The reason I chose this topic was to mainly understand the scalability of the system, because it is one of the challenging datastore to design in e-commerce, because the product grows exponentially, this will help us to understand, how to keep up with your datastore.
I will design this datastore in multiple iteration, to cover the different phases of the company, where it was started as small online book seller to sell everything in world and learnings in the each iteration.
We want to design a datastore for products of an e-commerce website, which is recently started and it is an online book store, where we will be selling books.
Since the website is relatively new, we are anticipating 10K users to sign up, 10k products/books that we are planning to sell and 5k QPS at max.
Mark I (V1)
We need to design our datastore to store the products for our e-commerce website.
- Should be able to store 10s of thousands of product details
- Should be able to serve 10k QPS, 2x of max QPS
- Should not have any data loss
We can use MySQL DB which will be able to support all our requirement mentioned above. Let’s create a data model for our product and store it in a simple MySQL DB which has replicas for data resiliency.
We can connect directly to our DB from the backend service and query the products and serve the request, the implementation is straight forward.
It isn’t fancy design, but it will work for our current use case. Now our website has got traction, we are seeing more more users visiting our website and customer started to complain it is very slow, of course, we should have had metrics to track to this latency and identify before it happens.
One thing we can do to reduce the latency is adding more backend service on a load balancer and it will work. Let’s say in 6 months, your customers grown in 2X. But adding more service instances doesn’t help anymore, because now your Database started to degrade.
When your database reach the max throughput, we can’t squeeze more juice from our database. But we have another trick in the book to improve the latency, which is “caching”. We added caching on top of the DB to reduce the load on your database and it works well.
But our product managers decided to add more category, we will be starting with gadgets, mobiles, computers etc and more importantly, we wanted become marketplace rather than retailer. Now we realised that current design will not fit all the products in single DB machine.
Let’s redesign the catalog datastore 🙂
War Machine (V2)
As previously mentioned, we need to redesign our datastore to support millions of products and 100s thousands of sellers. In the traditional MySQL DB it will be difficult to scale the DB, it allows only vertical scaling, by increasing RAM and HDD/SDD and it also has its limit, you cannot scale beyond it.
- Should be able to store 10s of millions of product details
- Should be able to serve 100k QPS, 2x of max QPS
- Should not have any data loss
We decided to horizontally scale the MySQL DB. Now the requirement for the datastore got changed, we need our datastore to support horizontal scalability, so that in future, if there are more products are being onboarded we can easily scale out. We can use sharding to horizontally scale the DB. Please go through the sharding for more info
One of the common question may arise here, previously it was simple, which had one node had all the data, we can easily can connect to it, but in case of sharded database, the data was scattered across the shards, then how the clients know which shard to call for the data.
We decided to create a dumb client where the client connects to one of the node in cluster and send the queries to shard, internally shard identifies the Primary shard for the query and scatter the request and gather the response and send it back to clients.
Why this approach?
We want to design the client very simple and connect to single node and let the node handle handle the changes in Primary shards, if the node fails it will move to other available Primary shards. It will also keep the less no of connections on the server side.
We have solved the problem of scaling with this design, but when the no of products increased, then the amount of exchange between the data store increased, which started to reduce the performance of our datastore. We had another major issue, which was all the node information was needed by clients to connect, machines always tends to fail and we had to change the database node config and restart the same.
Hulk Buster (V3)
As mentioned in the previous versions, we had to address the performance issue caused by the information exchange due to data fetch and remove the dependency of server information from the clients.
We decided to introduce the Zookeeper to store all the node information about the datastore and update the nodes when there is change in shard. Client can connect to zookeeper then update the node info, which doesn’t require downtime, also we can make client be aware of the sharding algorithm so that we can decouple the logic from shards to identify the primary shard. Idea is to make shard dumb and client smarter.
We had decided to make the client bit more smarter, it fetches shard info from zookeeper and use the sharding algorithm to find the shardId, then using the node information it calls the primary shards for all the keys by scatter gathering, this reduces the information exchange between shards significantly and increase the performance.
Hulk Buster version of the datastore worked quite well for long time and we didn’t face issues in terms of scaling, until our product managers decided to add another vertical which is mother of all category, the apparels, you may think that what is wrong with the apparel, why is it so big.
Lets consider a T-shirt which is available in 4 different colours and 4 different size which is 16 products, this is just for mens, then you have women and kids, plus it can be sold by multiple sellers. We had around 100 million products on our inventory till then, but from this point it is going to be 3x of it when you add apparels, this is massive amount of data, current sharding may not even work, we need to rethink of our sharding and whatnot.
Time for a redesign 😉
Mark L (V4)
Our the time we had realised there were lot issues with the existing sharding algorithm and it really hard to scale with the existing sharding in the database.
- Any change in the shard is always complicate, it requires re-sharding which could lead to downtime
- Writes were affected in our current sharding, if the primary shard goes down. But reads are good with reads, because it can be served by read replicas.
- Rebalancing the shard was painful
- We cannot have asymmetrical node config, since our sharding treats all the nodes equal weights
We need a dynamic sharding capability and resilient to node failures, fortunately, there is a solution already available for this exact use case which is “Consistent Hashing”.
Consistent hashing is simple yet powerful solution to dynamic sharding. The idea is all the shards are arranged in ring in ascending order and your node is also present in the ring.
We will be using the client’s key to identify the position in the hash ring and move clockwise to find the node available, this is the primary shard/node for a given key. In this approach you don’t need to know which node is primary and if doesn’t present then fall back to replica and whatnot. It makes the client behaviour very simple.
We will store the replicas in the subsequent nodes present in clockwise in the hash ring. Let’s say if you have replication factor as 2, then we will be storing the data in 2 adjacent subsequent nodes present in clockwise.
We use the same strategy as the identifying the key, we will store the replica data in next node which are present in the ring clockwise. Let’s say if the node fails, then client will query the data will move to next node automatically using the algorithm, where the replica presents.
In the current design consistent hashing allowed us to have the dynamicity which we wanted in the sharding.
- We can add more nodes to scale the shards
- When node goes down then the immediate replica take over the shards owned by the previous node automatically. So node failure will not impact any downtime on read or write
- We can have asymmetrical nodes in the config. Let say you have 2 machines m1 and m2, where m2 is 2x of m1. In this case you need to make sure that m2 occurs 2 times in the hash ring which is quite easy to do.
Names of the versions are Iron man's armour names from the Marvel Iron man comics and movies