What is the CAP Theorem?
CAP theorem states that any distributed system can support only two among :
- Partition tolerance
Before we proceed, understand that in a distributed system, each instance of system is referred as node below.
What do you mean by Consistency?
- When data is distributed among nodes, all nodes see the same data at a given time.
- When queried, each node will return the latest data. To achieve consistence, when we write data into one nodes, all the nodes should be updated (should have updated view of data) before allowing read on that data.
For example: Consider your ATM machines as part of consistent systems. If you add or remove money from 1 ATM machine, you do not want to see different view when you go to different ATM.
What do you mean by Availability?
- At all times, every request being fired at the system generates a valid response, even if one or more nodes went down. Both read and write requests should get allowed, if only read is allowed and write is not allowed, we can’t say the system is available.
- Another way to state this—all working nodes in the distributed system return a valid response for any request, without exception.
Example: When you are using Instagram, you want your Picture feed to be available all the time, no matter if one or more servers of Instagram goes down. You may be ok with not getting latest of information or picture feed in this case, but what will be the worst is you getting message that one of the Instagram server is down, please come back after few minutes.
What do you mean by Partition Tolerance?
Partition tolerance implies:
- A partition is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes.
- Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.
- Once the network is re-established, nodes may flow the latest information to each other.
Example: Let’s again take an Instagram example. Consider there is a network break between the Instagram server. How do you want Instagram servers you are connecting to behave in this case? With a message that there is a network break and we will serve you pictures in some time, or you would like to see your picture feed, though may not the latest pictures of all people you follow. And when the network is re-established, you will get the latest pictures.
How are Systems classified based on the CAP theorem?
Systems are classified into three types under CAP Theorem:
- Data is consistent between all nodes, and you can read/write from any node, while you cannot afford to let your network go down.
- A CA database delivers consistency and availability across all nodes. It can’t do this if there is a partition between any two nodes in the system, however, and therefore can’t deliver fault tolerance.
- For example: RDBMS like MSSQL Server, Oracle, MySQL, Postgress
- A CP database delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any two nodes, the system has to shut down the non-consistent node (i.e., make it unavailable) until the partition is resolved.
- For example: Redis, Google Big Table, MongoDB, and HBase
- An AP database delivers availability and partition tolerance at the expense of consistency. When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others. (When the partition is resolved, the AP databases typically resync the nodes to repair all inconsistencies in the system.)
- For example: CouchDB (document-oriented), Dynamo DB, and Cassandra (columnar)
How to use CAP theorem during system design interview?
So by now we know that different data stores provides different types of guarantees, some provides consistency, some partition tolerance and some availability.
Now during system design interview, understand the requirement of system. Ask yourself:
- Do I need a consistent view of data for all servers throughout the world at every given moment?
- Or is it the availability of the system which is more important to me?
- Or is it ok, if network breaks occur but the system continues to serve stale data?
Can you give me a classification of important data stores being used as CA, CP, or AP?
|RDBMS Systems like:|
Google Big Table
How is CA possible in Distributed architecture?
We listed this type last for a reason—in a distributed system, partitions can’t be avoided. So, while we can discuss a CA distributed database in theory, for all practical purposes, a CA distributed database can’t exist.
However, this doesn’t mean you can’t have a CA database for your distributed application if you need one. Many relational databases, such as PostgreSQL, deliver consistency and availability and can be deployed to multiple nodes using replication.
Working with microservices
Understanding the CAP theorem can help you choose the best database when designing a microservices-based application running from multiple locations.
For example, if the ability to quickly iterate the data model and scale horizontally is essential to your application, but you can tolerate eventual (as opposed to strict) consistency, an AP database like Cassandra or Apache CouchDB can meet your requirements and simplify your deployment.
On the other hand, if your application depends heavily on data consistency—as in an eCommerce application or a payment service—you might opt for a relational database like PostgreSQL.
Final thoughts on CAP theorem
Each one of the three properties, namely, Availability, Consistency and Partition Tolerance, should not be viewed as a binary off/on switch , but rather as tunable parameters when you’re designing a distributed system. That is, if you opt for more consistency, you’ll need to make your availability or partition tolerance requirements little lax. Conversely, you can tune up your availability if you are prepared to sacrifice some consistency or network partition tolerance.