Partitioning in PostgreSQL

Partitioning in PostgreSQL

Partitioning refers to splitting a large table into smaller pieces. This article covers the basics of partitioning in PostgreSQL.

Currently, PostgreSQL supports range and list partitioning via table inheritance. Basically, you have to create each partition as a child table of the master table. Generally, if you want to split data into specific ranges, then use range partitioning.

Date and timestamp values are good examples for this. For example, you may want to split yearly data into quarterly partitions. Similarly, if you want to split data based on a specific list of values, then use list partitioning. For this, you can consider creating partitions for each state of a country. Generally, in range partitioning, you would be adding more partitions over time, whereas in list partitioning the data will be continuously growing within each partition. This design decision is very important from the partition maintenance point of view, which most people ignore.

Although the syntax for creating partitioned tables (just like the one in a commercial RDBMS) is not available in PostgreSQL, you may wonder exactly how to set up partitioning in PostgreSQL. You will be glad to know that PostgreSQL uses the concept of inheritance, triggers and constraint exclusion to support partitioning. Additionally, you can also use rules instead of triggers.

Now, inheritance and constraint exclusion come with their own pros and cons. There are various scenarios that you need to take care of. Sharing details on them is the main goal of this article. Further, once that has been addressed, the next goal should be to automate partition maintenance because you would most certainly like to automate the creation and maintenance part of partitioning. As you will realise by the end of this article, partition maintenance is another important task, apart from setting up the partitions.

Partitioning using inheritance

Before we begin, you should note that there are various ways to set up partitioning. The scheme that I am going to show here is just one of many. You would need to tweak it as per the actual requirements.

Before we start, make sure that the constraint_exclusion configuration option is set to either ON or PARTITION in the postgresql.conf file. This is absolutely necessary to get the partitioning scheme working.

First, let us create an empty master table without any primary key. If you are wondering about the missing primary key, then you should know that this is an empty master table, and that we need to set up a primary key on each of the partitions separately (more on this in the latter part of the article):

CREATE TABLE orders (
     id            INT NOT NULL,
     address       TEXT NOT NULL,
     order_date    TIMESTAMP NOT NULL
);

Now, create some partitions using non-overlapping CHECK constraints:

CREATE TABLE orders_part_2011 (
    CHECK (order_date >= DATE '2011-01-01'
                    AND order_date < DATE '2012-01-01')

) INHERITS (orders);

CREATE TABLE orders_part_2010 (
    CHECK (order_date < DATE '2011-01-01')
) INHERITS (orders);

Here, the partition orders_part_2011 will hold all data from 2011 onwards, whereas orders_part_2010 will hold all previous data. Note that I am not using any primary key clause, as order_date cannot be a primary key in this particular setup. Further, always make sure that you match the data-type of values used with that of the column referred to in the CHECK constraints.

Optionally, you may create indexes on the partitioning key columns for each of the partitions:

CREATE INDEX orders_part_2011_idx ON orders_part_2011(order_date);
CREATE INDEX orders_part_2010_idx ON orders_part_2010(order_date);

The partitioning scheme is not yet ready. What I really want is that the data inserted into the master table should be redirected to the appropriate partition. This is where I need a trigger or rule. Triggers and rules have their own pros and cons. Here I use triggers, with trigger functions.

Again, a trigger function can be written in many ways, depending on the requirements. I will show two ways of writing one, along with their advantages and disadvantages. Let’s create a trigger function as shown below:

CREATE OR REPLACE FUNCTION orders_insert_simple() RETURNS TRIGGER AS $$
DECLARE
vsql Text;
BEGIN
   vsql :=   'INSERT INTO orders_part_'|| to_char(NEW.order_date, 'YYYY' )||
   ' VALUES ('||NEW.id||','||quote_litera(NEW.address)||','||quote_literal(NEW.order_date)||')';
   RETURN NULL;
END;
$$ LANGUAGE plpgsql;

Now we need to attach this trigger function to a trigger, to handle the INSERT on the master table:

CREATE TRIGGER orders_insert_trigger
BEFORE INSERT ON orders
FOR EACH ROW EXECUTE PROCEDURE orders_insert_simple();

There are quite a few interesting things used in the trigger function that you need to closely look at. It is important to note that the partition name is computed on the fly. This works for our particular case, as all the required partitions were created beforehand. This function would not require any updates, even if you add new partitions for the next year, and so on.

Looks easy? Then take a close look again, at the part where I have used quote_literal to quote the input values. Essentially, you need to quote each and every possible value that is going to be inserted into the partitions.

Similarly, you would need to take care of NULL values. This is the most cumbersome part of using this easy-to-write trigger function, so let’s try out another way of writing the trigger function:

CREATE OR REPLACE FUNCTION orders_insert() RETURNS TRIGGER AS
$$
BEGIN
    IF (NEW.order_date >= DATE '2011-01-01' AND NEW.order_date < DATE '2012-01-01') THEN
       INSERT INTO orders_part_2011 VALUES (NEW.*);
   ELSIF (NEW.order_date < DATE '2011-01-01') THEN
       INSERT INTO orders_part_2010 VALUES (NEW.*);
   ELSE
       RAISE EXCEPTION 'Date out of range. check orders_insert() function!';
   END IF;
   RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER orders_insert_trigger
BEFORE INSERT ON orders
FOR EACH ROW EXECUTE PROCEDURE orders_insert();

As you can see, we are now explicitly checking order_date to redirect data into the appropriate partitions. This doesn’t need handling of quote or NULL values — at least, for non-partition-key values. I have simply used NEW.* instead of referring to individual column values.

In case you are not able to create all partitions beforehand, you would need to update the above trigger function with the logic to create partitions on-the-fly. Generally, we use separate triggers for partition creation and data redirection. Obviously, the partition creation trigger should always be fired before the redirection trigger. You should take care to avoid race conditions between those two triggers.

Now that the trigger is in place, let’s insert some records and verify the setup:

pg=# INSERT INTO orders VALUES(1, 'pune', '2011-08-22');
INSERT 0 0
pg=# INSERT INTO orders VALUES(2, 'pune', '2010-02-22');
INSERT 0 0
pg=# UPDATE orders SET address = 'bengaluru' WHERE id = 2;
UPDATE 1

Do not panic on seeing the INSERT statement messages — they simply convey that zero records were inserted in the base (master) table. Actually, the records were transparently inserted into the related partitions.

Another thing to note about the UPDATE statement is that you are not supposed to update the partitioning key. Basically, any change to the partitioning key value might result in the movement of that record to another partition, which is termed as the row movement.

A simple way to handle queries that do update the partitioning key is to capture the UPDATE query, delete the related record from the partition, and then fire an INSERT on the base table. Then the partitioning mechanism will kick in and redirect the record to the appropriate partition. This is not handled in the current setup, as I am updating the address value, which is a non-partition-key column.

Let’s now check where the records are stored:

SELECT * FROM orders;
 id |  address   |     order_date
----+--------------+---------------------------
  1 | pune         | 22-AUG-11 00:00:00
  2 | bengaluru | 22-FEB-10 00:00:00
(2 rows)

PostgreSQL knows about the child tables of the orders table, so it assumes that the user wants all the data, from the parent as well as all the children. The partitions are normal tables, and you can query them as usual:

SELECT * FROM orders_part_2011;
 id | address |     order_date
----+---------+------------------------
  1 | pune     | 22-AUG-11 00:00:00
(1 row)

SELECT * FROM orders_part_2010;
 id |  address  |     order_date
----+-----------+--------------------
  2 | bengaluru | 22-FEB-10 00:00:00
(1 row)

You can also check if the master table is really empty, by using the ONLY clause, which restricts the lookup to only the table specified in the statement:

SELECT * FROM ONLY orders;
 id | address | order_date
----+---------+------------
(0 rows)

Querying over partitions
Use the EXPLAIN feature to check the plan for querying over partitions:

EXPLAIN SELECT * FROM orders WHERE order_date = '02-JAN-11';
                                       QUERY PLAN
-----------------------------------------------------------------
 Result  (cost=0.00..26.01 rows=7 width=40)
   ->  Append  (cost=0.00..26.01 rows=7 width=40)
         ->  Seq Scan on orders  (cost=0.00..23.75 rows=6 width=44)
               Filter: (order_date = '02-JAN-11 00:00:00'::timestamp without time zone)
         ->  Seq Scan on orders_part_2011 orders  (cost=0.00..2.26 rows=1 width=18)
               Filter: (order_date = '02-JAN-11 00:00:00'::timestamp without time zone)
(6 rows)

In the above output, you see that only one partition was scanned, based on the WHERE clause conditions. Let’s look at another example:

EXPLAIN SELECT * FROM orders WHERE order_date = now();
                                     QUERY PLAN
-----------------------------------------------------------------
 Result  (cost=0.00..30.03 rows=8 width=41)
   ->  Append  (cost=0.00..30.03 rows=8 width=41)
         ->  Seq Scan on orders  (cost=0.00..26.50 rows=6 width=44)
               Filter: (order_date = now())
         ->  Seq Scan on orders_part_2011 orders  (cost=0.00..2.51 rows=1 width=18)
               Filter: (order_date = now())
         ->  Seq Scan on orders_part_2010 orders  (cost=0.00..1.01 rows=1 width=44)
               Filter: (order_date = now())
(8 rows)

Here all the partitions are scanned — definitely not what we wanted! You should be aware of the fact that the planner analyses the query before the values from the parameters or stored procedures are substituted. As the planner does not know the exact value of now() during the planning phase, it cannot prune partitions, and so scans all the partitions. You need to look out for such cases where constant values are expected. In case you are planning to use functions in the WHERE clause, do make sure to understand the various types of functions that can be created in PostgreSQL.

Let us now go through the finer details of the three features used in PostgreSQL partitioning.

Constraint exclusion

Constraint exclusion works with only range or equality check constraints. It might not work for constraints like the following:

ALTER TABLE product_items_j ADD CONSTRAINT chk_item_name CHECK (item_name LIKE 'P%');

More importantly, the WHERE condition should be similar to the CHECK constraints. For example, if you have the following CHECK constraint:

ALTER TABLE product_items_j ADD CONSTRAINT chk_item_name CHECK (item_name BETWEEN 'P' AND 'PB'');

Then constraint exclusion would not help for the following query, since the WHERE condition is not similar to the CHECK constraint:

SELECT item_name FROM product_items WHERE item_name LIKE 'Q%' ;

To get the constraint exclusion working, you need this form of query:

SELECT item_name FROM product_items WHERE item_name = 'Pen';

From the above examples, it should be clear that the user needs to take extra effort while handling the WHERE clause predicates, in order to hit the CHECK constraints. The other major problem with this setup is that there is no automatic way to verify if all the CHECK constraints are mutually exclusive. You need to be extra careful while setting them up.

The EXPLAIN feature comes in very handy to tackle these issues. Basically, any change to the query or table, even the slightest one, should be rigorously followed by looking at the EXPLAIN output. If you do not see the expected plan, then either the WHERE clause or the CHECK constraints need to be looked into.

Inheritance

The important rule that you should always be aware of is that child tables inherit column DEFAULT values, not NULL and CHECK constraints only. Any other constraints, like UNIQUE, PRIMARY and FOREIGN key, will not be inherited. Also, indexes, ownership and permissions will not be inherited.

One has to set up the non-inherited constraints on all the child tables. However, there are various ways of setting up these constraints, which we will discuss in the very next section. The child tables cannot rename inherited columns, but can add new columns — this is the biggest advantage of using inheritance.

You should also note that one can enable or disable inheritance on child tables by using the ALTER TABLE command. This is, in fact, very useful when you want to remove or move partitions from the base table.

Using inheritance, you can have multiple levels of partitioning. You can always create a great-great-grandchild of a partition, but you might experience a performance loss with this kind of setup, since more base tables are involved, directly or indirectly. So generally, 1 or 2 levels for partitions and sub-partitions should be good. If you are going any deeper, then most probably you need to rethink the partitioning scheme employed.

Uniqueness

PostgreSQL uses indexes for supporting uniqueness. However, there are no multi-table indexes, which means that an index cannot span over more than one table. This is the reason for not having any primary key in our setup. In general, you might want to concentrate on indexing the partitioning key columns. For this, create non-overlapping CHECK constraints and then create a unique index on partitioning key columns over each partition. This is almost like having unique values over all partitions.

For indexing the non-partitioning-key columns, you create unique indexes over each partition. However, for verifying uniqueness, you need custom functions to scan all partitions, which would hopefully use the related indexes.

As you must have realised, this scanning of all partitions was not what we wanted, to begin with. Also, due to the lack of multi-table indexes, maintaining a primary key or a foreign key is technically not possible.

Consider that you created a primary key on each of the order_id columns of all child tables. Now it may happen that both orders_part_2011 and order_part_2010 have an order_id value of 1. Even though the uniqueness constraint within each child table is upheld, the base table orders now has duplicate order_id values.

To overcome this, you can create an additional lookup table, with primary keys from the base as well as child tables. Basically, you need to use triggers to keep the lookup table updated as you insert, update or delete records from the related partitioned table. Now all the primary and foreign key constraints can be mapped to this lookup table.

That’s it! These are basic setup issues that you must be able to get through now. You should now be ready to face the next challenge — performance tuning for partitions.

What’s around the corner?

The PostgreSQL community does have several discussions on this feature. For more details, you can visit this wiki entry on partitioning.

  • Salim

    First of all very thanks for this Very Nice article.
    It would be great if you explain about PK and FK handling when we partitioning existing table

  • Bole

    Everything nice except in trigger function you forgot to do the
    EXECUTE vsql; ….

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.