What is bitmap indexing?
Bitmap Indexing is a type of database indexing that uses bitmaps.
Bitmap indexes provide greatest advantage when applied on columns having low cardinality, that is, columns in which the number of distinct values is small compared to the number of rows in the table.
What is Cardinality?
Cardinality means uniqueness of data in column.
- A column representing Gender column where gender can be Male or Female, will have cardinality as 2.
- A column representing Category column where were we have 100 types of categories, will have cardinality as 100.
What is bitmap?
Bitmap is array of bits which uses sequence of bits to store information.
I don’t understand anything you wrote above, can you tell me in layman language?
Let’s start with an example:
- Consider you have a table to store Amazon inventory (Inventory table)
- Your inventory table looks like:
In above table, following columns are good candidate for Bitmap Index:
- Category [Cardinatlity is 3]
- Availability [Cardinality is 2]
- Status [Cardinality is 3]
In Bitmap indexing, number of bitmap required will be equal to cardinality for that column.
Let’s create Bitmap for Category and Status column as example.
Based on which ID have which category we set the bit to 1
Based on which ID have which status we set the bit to 1:
So, how this bitmap data helps when we query for any data?
Lets consider few examples:
|what to find?||how data is found?|
|Find all records with status as APPROVED||Go through Status bitmap, and find all ID which have corresponding APPROVED status bit value as 1|
|Find all records with status as APPROVED or IN_REVIEW||We have to do OR operation of Bitmap index of APPROVED and IN_REVIEW status.|
Do a BITWISE OR of APPROVED status bitmap an IN_REVIEW status bitmap.
|Find all records where Category is GROCERY and Status is APPROVED||We have to do AND operation between GROCERY Category bitmap and APPROVED Status bitmap|
What type of indexes are good for column with high cardinality columns and which one are good for low cardinality columns?
B-tree indexes are most effective for high-cardinality data: that is, data with many possible values.
BitMap are most effective for low cardinality data.
- Cardinality of Gender column which only has two distinct values (male and female), is 2.
- This is good candidate for bitmap indexes.
- Cardinality of Item Name for table representing e-commerce inventory, where total rows are close to 100K and total columns close to 70K
- This is not good candidate for bitmap indexes.
What is the problem of using B+ Tree with data with low cardinality?
B+ Tree builds a big tree, and no matter what the cardinality is, still tree is build.
This tree consumes a lot more memory if you compare it Bitmap index if the cardinality of the column is low.
In BitMap we uses very small amount of memory, but performance is same as B+ tree.
Can bitmap index have better performance than B-Tree indexes?
On a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index. A bitmap index on this column can out-perform a B-tree index, particularly when this column is often queried in conjunction with other columns.
Any query which I can use to find cardinality of column?
Well you can use this query:
SELECT column, COUNT(*) FROM table_name GROUP BY column;
Any special consideration while maintaining Bitmap indexes?
Maintaining a bitmap index takes a lot of resources (note we are talking about maintaining, not executing query using bitmap), therefore, bitmap indexes are only good for the read-only tables or tables that have infrequently updates.
Therefore, you often find bitmap indexes are extensively used in the data warehouse environment.