Exploring Database Indexes (part-1)
Database indexing is a method used to improve the query execution on a database table. It helps optimize query performance by reducing the amount of data that needs to be scanned to satisfy a query
But indexing has associated cost of additional space and some overhead during write operations. Throughout this article we will consider postgres database to better understand how indexing works.
There are various types of indexes that are available, and choice of index depends on how data has to be organised. Lets take look at various types of indexes available:
- btree
- gin
- hash
- lsm
B*Tree Index
Most commonly used is B-Tree index, which essentially supports queries to perform better based on sorting sequence. It operates similar to BST(Binary Search Trees)
Binary Search Trees
BST’s organise data in tree like structure, where all the keys from left side of root are less and on right side are greater. Small example of BST is as seen below.

There are challenges with implementation of because, if the keys that are inserted into table are either in ascending order or descending order, BST will be skewed towards left or right.

Now, if the tree is skewed, worst case complexity of search operation to search for key in index becomes O(n), which is when key is last element or its not found. In order to improve the search or query performance, depth/height of tree has to be balanced. This requires implementation of self balancing BST with help of AVL, RedBlack trees etc.
AVL self balancing trees, balances the height of tree with help or rotations (LL, LR, RL, RR) i.e. when each key is inserted into index, and if height is imbalanced it has to rotate with anyone of rotation techniques to balance its height, this equally impacts write performance.
Postgres uses extended version Lehman & Yao Algorithm for implementation of B-Tree with balancing factor configurable. This will ensure to safe gurad write paths during concurrent updates, which we will discuss in next parts.
Create Index
For simplicity let's consider and user table and analyse performance without index: See “Seq Scan on users” below
postgres=# explain select * from users where id=10;
QUERY PLAN
-------------------------------------------------------
Seq Scan on users (cost=0.00..17.75 rows=1 width=65)
Filter: (id = 10)
(2 rows)
Here are some quick ways to create index:
CREATE INDEX IF NOT EXISTS user_id_idx ON public.users (user_id);
postgres=# explain select * from users where id=10;
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using user_id_idx on users (cost=0.28..8.29 rows=1 width=65)
Index Cond: (id = 10)
(2 rows)
see how it performs index scan to retrieve its value, clearly cost making diff
Specify index type: USING index method/type (btree, gin etc)
CREATE INDEX IF NOT EXISTS user_id_idx ON public.users
USING btree (user_id);
------------------------------------------------------------------------------
postgres=# explain select * from users where id=10;
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using user_id_idx on users (cost=0.28..8.29 rows=1 width=65)
Index Cond: (id = 10)
(2 rows)
Use multiple key columns: first_name and last_name column
CREATE INDEX IF NOT EXISTS user_id_fname_lname ON public.users
USING btree (first_name, last_name);
postgres=# explain select * from users where first_name = 'Judy' and last_name='Crona';
QUERY PLAN
---------------------------------------------------------------------------------------------
Index Scan using user_id_fname_lname on users (cost=0.28..8.29 rows=1 width=65)
Index Cond: (((first_name)::text = 'Judy'::text) AND ((last_name)::text = 'Crona'::text))
(2 rows)
Use non key columns: use first_name,last_name in index.
When we query for first_name and last_name based on id, it performs Index Only Scan, which means database doesn’t need seek data from disk.
CREATE INDEX IF NOT EXISTS user_id_idx_fname_lname ON public.users
USING btree (user_id) INCLUDE(first_name,last_name);
------------------------------------------------------------------------------
postgres=# explain select first_name, last_name from users where id = 12;
QUERY PLAN
-------------------------------------------------------------------------------------------
Index Only Scan using user_id_idx_fname_lname on users (cost=0.28..8.29 rows=1 width=13)
Index Cond: (id = 12)
(2 rows)