What is columnar database?

It is a type of NoSQL database, also called wide-column store or extensible record stores. 

sign language asl GIF by Sign with Robert

It uses tables, rows, and columns, but names and format of the columns can vary from row to row in the same table. 

While a relational database is optimized for storing rows of data, typically for transactional applications, a columnar database is optimized for fast retrieval of columns of data, typically in analytical applications. 

How are Columnar databases different from Row oriented databases ?

First understand Row Oriented Databases:

General databases which we have used till now like MySQL, Oracle etc are Row oriented databases. These databases store data on file system row by row, where a single row represent different fields related to record. 

Consider this table:

IDnamepricecategoryavailabilitystatus
1COFFEE100GROCERYYESAPPROVED
2COOKIES50GROCERYYESAPPROVED
3MOBILE10000ELECTRONICSNOIN_REVIEW
4TV20000ELECTRONICSYESAPPROVED
5MUSIC_SYSTEM5000ELECTRONICSYESREJECTED
6CAR500000BOOKNOREJECTED

Table rows are stored in sequential order. Hence on filesystem this table may look like:

1,COFFEE,100,GROCERY,YES,APPROVED;2,COOKIES,50,GROCERY,YES,APPROVED;3,MOBILE,10000,ELECTRONICS,NO,IN_REVIEW;4,TV,20000,ELECTRONICS,YES,APPROVED;5,MUSIC_SYSTEM,5000,ELECTRONICS,YES,REJECTED;6,CAR,500000,BOOK,NO,REJECTED

Columnar databases:

In columnar database same records will be stored, column by column. Hence on file system above table will look like:

1,2,3,4,5,6;COFFEE,COKKIES,MOBILE,TV,MUSIC_SYSTEM,CAR;100,50,10000,20000,5000,500000;GROCCERY,GROCCERY,ELECTRONICS,ELECTRONICS,ELECTRONICS,BOOK;YES,YES,NO,YES,YES,NO;APPROVED,APPROVED,IN_REVIEW,APPROVED,REJECTED,REJECTED

You will notice that first all IDs are stored, then Names, then prices and so on.

The Wire Reaction GIF

What are the benefits of using Columnar databases?

Since you are storing all the column data together, some of the queries will be very fast. Consider the query to execute is: How many items have price greater than 10,000.

  • Since all prices are being stored together, you can easily count total prices with value greater than 10,000. 
  • In row oriented database, since prices are stored row by row, total Disk IO will be much more than column based datastore as the required data is separated by other fields data.
good mythical morning agree GIF by Rhett and Link

Columnar database have following advantages:

  • Aggregation queries, where you have to just work on subset of data, will be very fast.
  • Since data type of each column is similar, you can get better compression when running compression algorithms on each column making queries even faster. 

To confirm your understanding, lets go little deeper:

Storage systems have to pull data from physical disk drives, which store information magnetically on spinning platters using read/write heads that move around to find the data that users request. The less the heads have to move, the faster the drive performs. If data is kept closer together, minimizing seek time, systems can deliver that data faster.

love and hip hop knowledge GIF by VH1

Well, what if I needed to query multiple values for each item (basically row)?  

Pondering Which One GIF by Tkay Maidza

You are thinking right, you may have many cases where you have to get all data for given item basically row by row. For example you have to display details of all Items in your inventory. In such cases row oriented databases makes more sense, then columnar databases.

So, more the fields you have to read per record, lesses will be the benefit of columnar database.

Hence, generally when you are creating your app you are using row oriented databases (MySQL, Oracle etc) only. Columnar databases have their specific uses (we will be discussing this later in this article)

Are there any performance differences between columnar storage, compared to Row oriented databases when writing data?

Oh my god, you are asking all the right questions 🙂

Smart Think About It GIF by Friends

Yes, writing new data can take more time:

  • When you are inserting new record into row-oriented database, you simply write that in one operation. But if you’re inserting a new record to a columnar database, you need to write to each column one by one. 
  • As a result, loading new data or updating many values in a columnar database could take much more time.

It seems I shouldn’t use Columnar database then?

Something Fishy No GIF by Physics Girl

Well both databases may have their place in your application.

  • You would usually want a row-oriented database like MySQL running the back-end of your web app, etc. 
  • And once your app becomes huge, you would also want to consider having a columnar database like Amazon Redshift to run your BI (business intelligence) analytics queries (which usually consist of aggregation queries). 

Remember:

Columnar databases are good for:

  • Queries that involve only a few columns
  • Aggregation queries against vast amounts of data
  • Column-wise compression

But are not so good at:

  • Incremental data loading
  • Online Transaction Processing (OLTP) usage
  • Queries against only a few rows

What is advantage of columnar database?

It allows for fast retrieval of columns of data and can be scaled using distributed clusters of low-cost hardware to increase throughput.

Where are they generally used?

Mainly used for:

  • Analytical applications because it allows for fast retrieval of columns of data.
  • Data warehousing and big data processing because they scale using distributed clusters of low-cost hardware.

Can yo give example of columnar databases?

Yes. Here are few:

  • Amazon Redshift
  • BigQuery
  • Apache HBase
  • Apache Cassandra

Rakesh Kalra

Hello, I am Rakesh Kalra. I have more than 15 years of experience working on IT projects, where I have worked on varied complexity of projects and at different levels of roles. I have tried starting my own startups, 3 of those though none of it were successful but gained so much knowledge about business, customers, and the digital world. I love to travel, spend time with my family, and read self-development books.

0 Comments

Leave a Reply