What is the CAP Theorem?
CAP theorem states that any distributed system can support only two among :
- Partition tolerance
Before we proceed, understand that in a distributed system, each instance of system is referred as node below.
What do you mean by Consistency?
- When data is distributed among nodes, all nodes see the same data at a given time.
- When queried, each node will return the latest data. To achieve consistence, when we write data into one nodes, all the nodes should be updated (should have updated view of data) before allowing read on that data.
For example: Consider your ATM machines as part of consistent systems. If you add or remove money from 1 ATM machine, you do not want to see different view when you go to different ATM.
What do you mean by Availability?
- At all times, every request being fired at the system generates a valid response, even if one or more nodes went down. Both read and write requests should get allowed, if only read is allowed and write is not allowed, we can’t say the system is available.
- Another way to state this—all working nodes in the distributed system return a valid response for any request, without exception.
Example: When you are using Instagram, you want your Picture feed to be available all the time, no matter if one or more servers of Instagram goes down. You may be ok with not getting latest of information or picture feed in this case, but what will be the worst is you getting message that one of the Instagram server is down, please come back after few minutes.
What do you mean by Partition Tolerance?
Partition tolerance implies:
- A partition is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes.
- Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.
- Once the network is re-established, nodes may flow the latest information to each other.
Example: Let’s again take an Instagram example. Consider there is a network break between the Instagram server. How do you want Instagram servers you are connecting to behave in this case? With a message that there is a network break and we will serve you pictures in some time, or you would like to see your picture feed, though may not the latest pictures of all people you follow. And when the network is re-established, you will get the latest pictures.
How are Systems classified based on the CAP theorem?
Systems are classified into three types under CAP Theorem:
- Data is consistent between all nodes, and you can read/write from any node, while you cannot afford to let your network go down.
- A CA database delivers consistency and availability across all nodes. It can’t do this if there is a partition between any two nodes in the system, however, and therefore can’t deliver fault tolerance.
- For example: RDBMS like MSSQL Server, Oracle, MySQL, Postgress
- A CP database delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any two nodes, the system has to shut down the non-consistent node (i.e., make it unavailable) until the partition is resolved.
- For example: Redis, Google Big Table, MongoDB, and HBase
- An AP database delivers availability and partition tolerance at the expense of consistency. When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others. (When the partition is resolved, the AP databases typically resync the nodes to repair all inconsistencies in the system.)
- For example: CouchDB (document-oriented), Dynamo DB, and Cassandra (columnar)
How to use CAP theorem during system design interview?
So by now we know that different data stores provides different types of guarantees, some provides consistency, some partition tolerance and some availability.
Now during system design interview, understand the requirement of system. Ask yourself:
- Do I need a consistent view of data for all servers throughout the world at every given moment?
- Or is it the availability of the system which is more important to me?
- Or is it ok, if network breaks occur but the system continues to serve stale data?
Can you give me a classification of important data stores being used as CA, CP, or AP?
|RDBMS Systems like:|
Google Big Table