Table Partitioning in Postgres

4 min readJul 15, 2024

Partitioning is the process of breaking up a huge database table into smaller, manageable child table. This process is carried for scaling, improving query performance and so many other reasons.

Choosing right partitioning strategy can become very crucial decision while we want to achieve better results. Poor partitioning strategies can create adverse effects than adding value. For example: After partitioning, if majority of data resides in single partition and if that is read intensive, then likely this partitioning strategy is not going to help.

The partitioned table itself is a virtual table having no storage of its own. Instead, the storage belongs to partitions, which are otherwise-ordinary tables associated with the partitioned table.

Postgres allows 3 ways of declarative partitioning:

List Partitioning
Range Partitioning
Hash Partitioning

For simplicity throughout this article we will consider company employee database, which has employees working across globe.

List Partitioning

Data is partitioned based on a discrete list of values for a specific column (e.g., partition by region).

CREATE TABLE employee (
    first_name VARCHAR(30) NOT NULL,
    last_name VARCHAR(30) NOT NULL,
    country char(2) NOT NULL,
    PRIMARY KEY (first_name, country)
) PARTITION BY LIST(country);

We know that there are limited country codes that exist, we can use list based partitioning strategy to divide this table into smaller tables, so that we can query those specific partitions while looking for geo specific data. Keep in mind that your partition column must be one of keys in primary key.

create table emp_in partition of employee for values in ('IN');
create table emp_us partition of employee for values in ('US');

insert into employee  values ('foo', 'a', 'US');
insert into employee  values ('bar', 'b', 'IN');

Range Partitioning

Data is partitioned based on ranges of column values (e.g., partition by date ranges).

Since we know that employee table organically grows over period of time as new employees join every year, we can partition this table based on data ranges (year, month, day etc). In our example lets consider to partition based on year. Look carefully of PARTITION BY syntax, which uses RANGE followed by column.

CREATE TABLE employee (
    first_name VARCHAR(30) NOT NULL,
    last_name VARCHAR(30) NOT NULL,
    country char(2) NOT NULL,
    start_time timestamp without time zone NOT NULL
) PARTITION BY RANGE (start_time);

Now lets start creating these child tables that looks similar to parent table. INCLUDING ALL will include indexes, comments as it is from main table.

ATTACH PARTITION will attach this new table employee_2024 to main table employee, while doing we will specify range of values that belong to this table. In our case it is (‘2024–01–01’) TO (‘2024–12–31’) for all employees who joined in year 2024 from Jan-01 to Dec-31

-- create new table of partition for year 2024
CREATE TABLE employee_2024 ( LIKE employee INCLUDING ALL );

-- Attach this newly created partition for year 2024
ALTER  TABLE employee ATTACH PARTITION employee_2024 
 FOR VALUES FROM ('2024-01-01') TO ('2024-12-31');
------------------------------------------------------------------------------
------------------------------------------------------------------------------
-- create new table of partition for year 2023
CREATE TABLE employee_2023 ( LIKE employee INCLUDING ALL );

-- Attach this newly created partition for year 2023
ALTER  TABLE employee ATTACH PARTITION employee_2023 
 FOR VALUES FROM ('2023-01-01') TO ('2023-12-31');

Hash Partitioning

Data is distributed across partitions based on a hash function applied to one or more columns (e.g., partition by a hash of employee id).

Hashing partitioning is often used to distribute data equally based on hash when we do not have a natural way of partitioning your data or you want to evenly distribute the data based on hash.

In this we employ PARTITION BY HASH (emp_id), which will use the modulo to identify the right partition to store this record.

CREATE TABLE employee (
    emp_id bigint,
    first_name VARCHAR(30) NOT NULL,
    last_name VARCHAR(30) NOT NULL,
    country char(2) NOT NULL,
    start_time timestamp without time zone NOT NULL
) PARTITION BY HASH (emp_id);

---------------------------------------------------------------
CREATE TABLE employee_h0
PARTITION of employee
FOR VALUES WITH (modulus 5, remainder 0);

CREATE TABLE employee_h1
PARTITION of employee
FOR VALUES WITH (modulus 5, remainder 1);

CREATE TABLE employee_h2
PARTITION of employee
FOR VALUES WITH (modulus 5, remainder 2);

CREATE TABLE employee_h3
PARTITION of employee
FOR VALUES WITH (modulus 5, remainder 3);

CREATE TABLE employee_h4
PARTITION of employee
FOR VALUES WITH (modulus 5, remainder 4);

Partitioning pruning

After we partition data, we need to make sure that this can yield us best benefit while we query data to get results faster and optimise query plan.

Below is snippet that explains that though we partition, if we do not hint database to ignore unwanted partitions, it will go ahead and scan each of these. In our example, though we know that all the employees of US, will be in single partition, still query planner is going to scan emp_in, which contains hold employees of IN region.

employee=# explain analyse select * from employee where country = 'US';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Append  (cost=7.22..27.71 rows=4 width=168) (actual time=0.013..0.014 rows=1 loops=1)
   ->  Bitmap Heap Scan on emp_in employee_1  (cost=7.22..12.56 rows=2 width=168) (actual time=0.006..0.006 rows=0 loops=1)
         Recheck Cond: (country = 'US'::bpchar)
         ->  Bitmap Index Scan on emp_in_pkey  (cost=0.00..7.22 rows=2 width=0) (actual time=0.005..0.005 rows=0 loops=1)
               Index Cond: (country = 'US'::bpchar)
   ->  Seq Scan on emp_us employee_2  (cost=0.00..15.12 rows=2 width=168) (actual time=0.006..0.007 rows=1 loops=1)
         Filter: (country = 'US'::bpchar)
 Planning Time: 0.104 ms
 Execution Time: 0.045 ms

In order to scan only desired partitions, we need to set partitioning pruning property so that query planner will fetch from exact partitions only.

employee=# SET enable_partition_pruning = on;
SET
employee=# explain analyse select * from employee where country = 'US';
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on emp_us employee  (cost=0.00..15.12 rows=2 width=168) (actual time=0.008..0.009 rows=1 loops=1)
   Filter: (country = 'US'::bpchar)
 Planning Time: 0.061 ms
 Execution Time: 0.020 ms

Conclusion: Choosing the right partitioning strategy or even if table requires partitioning becomes critically important.

Creating too many partitions inside the table can potentially lead to problems, like significant time spent on query plans, having to scan multiple partitions to fetch fewer rows.

Skewed partitions leading to poor data locality and bloated index of some of these larger partitions.

Table Partitioning in Postgres

List Partitioning

Range Partitioning

Hash Partitioning

Partitioning pruning

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sudheer Gajula

No responses yet

More from Sudheer Gajula

Demystifying Pages in postgres

Often we have heard of term “Pages” , “Blocks” in database, Page is a fixed-length block of data, each page is of 8192 bytes, which is 8KB…

Handling GEO Spatial data in Postgres with H3

Geospatial data has important role in location based service, which is used to match supply and demand. For example — requesting ride from…

Exploring Database Indexing (part-2)

Concurrent Index Updates

Log4j 0-Day Vulnerability: All You Need To Know

During end of the year when lot of us were planning for vacations, vulnerability first came to light on December 9 and was highlighted by…

Recommended from Medium

How a Simple Query Brought Down Performance: A PostgreSQL Partition Pruning Mystery

The Curious Case of the Missing Pruning

Understanding PostgreSQL’s “ctid": A Hidden Gem for Efficient Row Updates 💎

PostgreSQL is widely known for its powerful features, including advanced indexing, full-text search, and the ability to handle massive…

Lists

data science and AI

Natural Language Processing

PostgreSQL Query Plan: A Guide for Understanding Execution

The PostgreSQL query plan is a detailed blueprint outlining how the database intends to execute a specific query. Understanding this plan…

How to combine Pgbackrest and Streaming Replication using Postgresql-16

In this blog we are going to discuss about Combining pgBackRest and Streaming Replication

How to Migrate from Oracle to PostgreSQL Using Ora2Pg: Step-by-Step Tutorial

Ora2Pg is an open-source tool designed to migrate Oracle databases to PostgreSQL. It automates the process of converting Oracle database…

Simplify JSON Data Handling in PostgreSQL with JSON_QUERY(), JSON_EXISTS(), and JSON_VALUE()…

Tired of wrestling with complex JSON queries in PostgreSQL? Discover how the new SQL/JSON functions — JSON_QUERY(), JSON_EXISTS(), and…