Exploring Database Indexes (part-2)
Concurrent Index Updates
In our previous article we discussed about creating index and using them in advantage to perform efficient queries. These indexes can have overhead of storage and also maintenance.
Let’s understand underlying details of index updates and how database handles these concurrent requests to insert/update/delete entries from index.
Postgres leverages Lehman & Yao Algorithm for implementation with few tweaks, Efficient Locking for Concurrent Operations on B-Trees, this paper focuses on handling concurrent updates in index through its hierarchy from root.
In postgres every query is handled by process, which ensures each process has its own memory allocated to perform changes as needed by query and then flush them back to pages in disk. A page is block of memory which is 8kb by default. There can be multiple such processes operating on index, they can conflict paths when they are attempting to insert/update/delete them. There can be only one single process that can be allowed to update pages at leaf node for given page, which happens by obtaining exclusive write lock.
This paper highlights on how to safely handle concurrent execution, which mentions Bayer and Schkoinick method to obtain write-exclusion locks in path from root-leaf and allow readers to read and block other writers.
B-link-Tree for Concurrency
B link tree is an extension of B*-tree with an additional right pointer on the leaf nodes to access next page. This addition of right link will help during the page splits, i.e. when new key is inserted and leads to page split, a new right link is added from old page to new page.
Here are few rules that we need to understand from Lehman & Yao algorithm:
- Each node can have k +1 childrens, where k is configurable parameter
- Each node can store at max 2*k elements, if k=2 they can hold max 4 keys in node
- high-key: Every node will have an additional attribute called high key to store highest value.

- Safe Operation — Any operation that does not require page split, is considered as safe operation. If there is room to accommodate new key, it doesn’t require split.
- UnSafe Operation — Any operation that require page split, is considered as safe operation.
Insertion
Lets consider k=2 which means each node can store 2*k i.e. 4 keys in its page. Now when we attempt to insert key=3 into table, this will require index to be updated. During this process as shown in Fig2, this will require a page split to accommodate new insertion.

The purpose of the link pointer is to provide an additional method for reaching a node. When a node is split because of data overflow, a single node is replaced by two new node. The link pointer of the first new node points to the second node; the link pointer of the second node contains the old contents of the link pointer field of the first node.
Link pointer serves as a “temporary fix” that allows correct concurrent operation, when a new node is added as part of split operation all the hierarchies must be updated accordingly, and this is done by subsequent processes. If the search key exceeds the highest value in a node, it indicates that the tree structure has been changed, and that the twin node should be accessed using the link pointer.
Search
During query execution to search for element, it performs an index scan and during the scan process if it notices that if search key k is more than high value in node, then it infers that previous operation has changed structure and searches for element through next link and updates its hierarchies. It will be able to determine if search k actually exists in the index.
While Lehman & Yao Algorithm only speaks about addition of new right link pointer, postgres specifically adds new left link pointer as well to support scans backward, ensuring that once we descend to leaf pages in tree, we do not have to recurse back to its parent for scanning instead user their left and right pointers. (see Fig 3)
