You may think that Sharding is the solution to most of your DB scaling issues, but be considerate of areas you have to take care of while doing sharing:

  • Add complexity to the system
  • Database Joins become more expensive and not feasible in certain cases
  • Sharding can compromise database referential integrity
  • Database schema changes can become extremely expensive
  • No native support always

How Sharding adds complexity in the system?

  • Implementing a sharded database architecture is a complex task, thinking about all current and future scenarios
  • If not done correctly, there is a risk that the sharding process can lead to lost data or corrupted tables.
  • Shards can be complicated to get right, particularly if your shard key isn’t obvious.

How Database Joins become more expensive and not feasible sometimes in case of Sharding?

  • When all the data is located in a single database, joins can be performed easily. But when you Shard the database, joins have to be performed across networked servers which can introduce additional latency for your service. 
  • Sometimes cross-server joins may not be an option (some servers support some doesn’t), then you may have to bring all data to application memory and perform join on your own.
    • Or keep the data in de-normalized form so that queries that previously required joins can be performed from a single table

How Sharding can compromise database referential integrity?

  • Most RDBMS do not support foreign keys across databases on different database servers.
    • Hence referential integrity often has to be enforced in application code
  • Cascade deletes are not possible, hence applications have to run regular SQL jobs to clean up dangling references.
  • If you’re using NoSQL DB, this may not bother you much, as you already know you will not get referential integrity.

How Database schema changes can become extremely expensive because of sharding?

  • Over time you may have to evolve your schema for any of reasons like:
    • Shard outgrows other shards and becomes unbalanced, which is also known as database hotspot.
      • If you are storing business places by the first character of business name and there are too many places with the character ‘A’.
      • Too many requests going to single shard.
      • You might have been storing user pictures and user emails in the same shard and now need to put them on different shards.
    • In such cases, either we have to create more DB partitions or have to rebalance existing partitions, which means the partitioning scheme changed and all existing data moved to new locations. Doing this without incurring downtime is extremely difficult or will have downtime.

Is Sharding supported by all database engine?

Sharding is not natively supported by every database engine.

Rakesh Kalra

Hello, I am Rakesh Kalra. I have more than 15 years of experience working on IT projects, where I have worked on varied complexity of projects and at different levels of roles. I have tried starting my own startups, 3 of those though none of it were successful but gained so much knowledge about business, customers, and the digital world. I love to travel, spend time with my family, and read self-development books.