Types of Databases: PostgreSQL Partitioning

Partitioning: Divide large into small pieces to manage it effectively.

Where Partitioning will help to improve

- Useful where we have to periodically load and remove or purge the subsets of data

- Maintenance activities will complete quickly as compared to large tables.

- Performance will increase because partitions relevant to queries stores in mem so it reduces the fetch time.

- complex analytic queries can easily operates to individual partitions

History of partitioning in postgres

V8.1 (2005)

V9 :

Partitioning work with table inheritance, check constraints and trigger functions to redirect the data

Range and List Partitioning Supported.

V10

Partitioning Syntax added in this version.

No need to use a trigger function to redirect the data , internal function will work for this

Range and List Partitioning Supported.

v11

Support default partitions : where there is a no partition condition it will add in default partition

Hash partitions introduce

parallel partition scans

Optimizer partition eliminations

v12

Foreign keys can now reference partitioned tables

handling of tablespace assigned to partitioning.

psql enhancements

Range Partitioning : divide data according tpo the range

---Date/Time range

--- Price Range

--Year /Month

List Partitioning : any discrete list( no range)

--States/Cities

Hash Partitioning:

A table is partitioned by specifying a modulus and remainder for each partition.

Declarative Partitioning

Used a proper declarative syntax for creating a partitioning.

like Partition by range, Partition by list, Partition by hash

test=# create table item ( itemid int not null, logdate date not null, sales int ) partition by range (logdate);

CREATE TABLE

test=# create table item1 partition of item for values from ('2006-02-01') to ('2006-03-31');

CREATE TABLE

test=# create table item2 partition of item for values from ('2006-04-01') to ('2006-05-31');

CREATE TABLE

test=# \d+ item

Partitioned table "public.item"

---------+---------+-----------+----------+---------+---------+--------------+-------------

Partition key: RANGE (logdate)

Partitions: item1 FOR VALUES FROM ('2006-02-01') TO ('2006-03-31'),

item2 FOR VALUES FROM ('2006-04-01') TO ('2006-05-31')

test=#

INSERTED SOME DATA ON PARENT TABLE

insert into item values(1,'2006-02-02',20);

But as you can see, the data size of the parent table is zero, because data resides in the child table.

test=# \dt+ item*

List of relations

--------+-------+-------------------+---------+-------------+------------+-------------

(3 rows)

Index is automatically created on the child table whenever I am creating an index on the parent table.

test=# create index item_idx on item(logdate);

CREATE INDEX

test=#

test=# \d+ item

Partitioned table "public.item"

---------+---------+-----------+----------+---------+---------+--------------+-------------

Partition key: RANGE (logdate)

Indexes:

"item_idx" btree (logdate)

Partitions: item1 FOR VALUES FROM ('2006-02-01') TO ('2006-03-31'),

item2 FOR VALUES FROM ('2006-04-01') TO ('2006-05-31')

test=# \d+ item1

Table "public.item1"

---------+---------+-----------+----------+---------+---------+--------------+-------------

Partition of: item FOR VALUES FROM ('2006-02-01') TO ('2006-03-31')

Partition constraint: ((logdate IS NOT NULL) AND (logdate >= '2006-02-01'::date) AND (logdate < '2006-03-31'::date))

Indexes:

"item1_logdate_idx" btree (logdate)

Access method: heap

test=# \d+ item2

Table "public.item2"

---------+---------+-----------+----------+---------+---------+--------------+-------------

Partition of: item FOR VALUES FROM ('2006-04-01') TO ('2006-05-31')

Partition constraint: ((logdate IS NOT NULL) AND (logdate >= '2006-04-01'::date) AND (logdate < '2006-05-31'::date))

Indexes:

"item2_logdate_idx" btree (logdate)

Access method: heap

test=#

Partition maintenance.

if you want to drop a partition table(example item3) then It will completely drop from the database.

test=# create table item3 partition of item for values from ('2006-06-01') to ('2006-07-31')

;

CREATE TABLE

test=#

test=# \d+ item

Partitioned table "public.item"

---------+---------+-----------+----------+---------+---------+--------------+-------------

Partition key: RANGE (logdate)

Indexes:

"item_idx" btree (logdate)

Partitions: item1 FOR VALUES FROM ('2006-02-01') TO ('2006-03-31'),

item2 FOR VALUES FROM ('2006-04-01') TO ('2006-05-31'),

item3 FOR VALUES FROM ('2006-06-01') TO ('2006-07-31')

test=# drop table item3;

DROP TABLE

test=#

test=# \d+ item

Partitioned table "public.item"

---------+---------+-----------+----------+---------+---------+--------------+-------------

Partition key: RANGE (logdate)

Indexes:

"item_idx" btree (logdate)

Partitions: item1 FOR VALUES FROM ('2006-02-01') TO ('2006-03-31'),

item2 FOR VALUES FROM ('2006-04-01') TO ('2006-05-31')

test=#

ALTER TABLE ITEM DETACH PARTITION item3; we can also remove or detach the child partition from parent partitions.

LIMITATIONS of Declarative Partitions

We have to manually create exclusion constraints on each partition. There is no command to create all partitions at onces

We have manually created unique constraints to maintain the uniqueness in each partition.

Partitions Pruning -Performance features;;;

your table or partition tables is only scanned when it's only required during query execution. system know if this data does n't is this partition so it won't scan it

We can enable and disable the feature, but by default it's ON.

set enable_partition_pruning='off'

test=# explain analyze select * from item where sales >= 20 and logdate > '2006-04-01';

QUERY PLAN

-----------------------------------------------------------------------------------------------------

Seq Scan on item2 item (cost=0.00..1.04 rows=1 width=12) (actual time=0.012..0.013 rows=1 loops=1)

Filter: ((sales >= 20) AND (logdate > '2006-04-01'::date))

Rows Removed by Filter: 2

Planning Time: 0.509 ms

Execution Time: 0.027 ms

(5 rows)

test=# set enable_partition_pruning ='off';

SET

test=# explain analyze select * from item where sales >= 20 and logdate > '2006-04-01';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------

Append (cost=0.00..2.10 rows=2 width=12) (actual time=0.041..0.043 rows=1 loops=1)

-> Seq Scan on item1 item_1 (cost=0.00..1.04 rows=1 width=12) (actual time=0.029..0.030 rows=0 loops=1)

Filter: ((sales >= 20) AND (logdate > '2006-04-01'::date))

Rows Removed by Filter: 3

-> Seq Scan on item2 item_2 (cost=0.00..1.04 rows=1 width=12) (actual time=0.011..0.011 rows=1 loops=1)

Filter: ((sales >= 20) AND (logdate > '2006-04-01'::date))

Rows Removed by Filter: 2

Planning Time: 0.524 ms

Execution Time: 0.059 ms

(9 rows)

test=#

Declarative partitioning best practices:

Selection on primary key column or column which can divide logically
Used column used mostly used in where clause
Too less partition means index remain large
Too many partitions can increase the planning time.
carefully choose subpartition
verify nature of data

Constraints Exclusion:

(Scan the Constraint of all child tables and exclude child table which doesn't match the where clause it query)

similar to partition pruning , only it will work during query planning where partition puring works at execution time.