Partitioning: Divide large into small pieces to manage it effectively.
Where Partitioning will help to improve
- Useful where we have to periodically load and remove or purge the subsets of data
- Maintenance activities will complete quickly as compared to large tables.
- Performance will increase because partitions relevant to queries stores in mem so it reduces the fetch time.
- complex analytic queries can easily operates to individual partitions
History of partitioning in postgres
V8.1 (2005)
V9 :
Partitioning work with table inheritance, check constraints and trigger functions to redirect the data
Range and List Partitioning Supported.
V10
Partitioning Syntax added in this version.
No need to use a trigger function to redirect the data , internal function will work for this
Range and List Partitioning Supported.
v11
Support default partitions : where there is a no partition condition it will add in default partition
Hash partitions introduce
parallel partition scans
Optimizer partition eliminations
v12
Foreign keys can now reference partitioned tables
handling of tablespace assigned to partitioning.
psql enhancements
Range Partitioning : divide data according tpo the range
---Date/Time range
--- Price Range
--Year /Month
List Partitioning : any discrete list( no range)
--States/Cities
Hash Partitioning:
A table is partitioned by specifying a modulus and remainder for each partition.
Declarative Partitioning
Used a proper declarative syntax for creating a partitioning.
like Partition by range, Partition by list, Partition by hash
test=# create table item ( itemid int not null, logdate date not null, sales int ) partition by range (logdate);
CREATE TABLE
test=# create table item1 partition of item for values from ('2006-02-01') to ('2006-03-31');
CREATE TABLE
test=# create table item2 partition of item for values from ('2006-04-01') to ('2006-05-31');
CREATE TABLE
test=# \d+ item
Partitioned table "public.item"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------+---------+-----------+----------+---------+---------+--------------+-------------
itemid | integer | | not null | | plain | |
logdate | date | | not null | | plain | |
sales | integer | | | | plain | |
Partition key: RANGE (logdate)
Partitions: item1 FOR VALUES FROM ('2006-02-01') TO ('2006-03-31'),
item2 FOR VALUES FROM ('2006-04-01') TO ('2006-05-31')
test=#
INSERTED SOME DATA ON PARENT TABLE
insert into item values(1,'2006-02-02',20);
But as you can see, the data size of the parent table is zero, because data resides in the child table.
test=# \dt+ item*
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+-------+-------------------+---------+-------------+------------+-------------
public | item | partitioned table | pawan | permanent | 0 bytes |
public | item1 | table | pawan | permanent | 8192 bytes |
public | item2 | table | pawan | permanent | 8192 bytes |
(3 rows)
Index is automatically created on the child table whenever I am creating an index on the parent table.
test=# create index item_idx on item(logdate);
CREATE INDEX
test=#
test=# \d+ item
Partitioned table "public.item"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------+---------+-----------+----------+---------+---------+--------------+-------------
itemid | integer | | not null | | plain | |
logdate | date | | not null | | plain | |
sales | integer | | | | plain | |
Partition key: RANGE (logdate)
Indexes:
"item_idx" btree (logdate)
Partitions: item1 FOR VALUES FROM ('2006-02-01') TO ('2006-03-31'),
item2 FOR VALUES FROM ('2006-04-01') TO ('2006-05-31')
test=# \d+ item1
Table "public.item1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------+---------+-----------+----------+---------+---------+--------------+-------------
itemid | integer | | not null | | plain | |
logdate | date | | not null | | plain | |
sales | integer | | | | plain | |
Partition of: item FOR VALUES FROM ('2006-02-01') TO ('2006-03-31')
Partition constraint: ((logdate IS NOT NULL) AND (logdate >= '2006-02-01'::date) AND (logdate < '2006-03-31'::date))
Indexes:
"item1_logdate_idx" btree (logdate)
Access method: heap
test=# \d+ item2
Table "public.item2"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------+---------+-----------+----------+---------+---------+--------------+-------------
itemid | integer | | not null | | plain | |
logdate | date | | not null | | plain | |
sales | integer | | | | plain | |
Partition of: item FOR VALUES FROM ('2006-04-01') TO ('2006-05-31')
Partition constraint: ((logdate IS NOT NULL) AND (logdate >= '2006-04-01'::date) AND (logdate < '2006-05-31'::date))
Indexes:
"item2_logdate_idx" btree (logdate)
Access method: heap
test=#
Partition maintenance.
if you want to drop a partition table(example item3) then It will completely drop from the database.
test=# create table item3 partition of item for values from ('2006-06-01') to ('2006-07-31')
;
CREATE TABLE
test=#
test=#
test=# \d+ item
Partitioned table "public.item"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------+---------+-----------+----------+---------+---------+--------------+-------------
itemid | integer | | not null | | plain | |
logdate | date | | not null | | plain | |
sales | integer | | | | plain | |
Partition key: RANGE (logdate)
Indexes:
"item_idx" btree (logdate)
Partitions: item1 FOR VALUES FROM ('2006-02-01') TO ('2006-03-31'),
item2 FOR VALUES FROM ('2006-04-01') TO ('2006-05-31'),
item3 FOR VALUES FROM ('2006-06-01') TO ('2006-07-31')
test=# drop table item3;
DROP TABLE
test=#
test=#
test=# \d+ item
Partitioned table "public.item"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------+---------+-----------+----------+---------+---------+--------------+-------------
itemid | integer | | not null | | plain | |
logdate | date | | not null | | plain | |
sales | integer | | | | plain | |
Partition key: RANGE (logdate)
Indexes:
"item_idx" btree (logdate)
Partitions: item1 FOR VALUES FROM ('2006-02-01') TO ('2006-03-31'),
item2 FOR VALUES FROM ('2006-04-01') TO ('2006-05-31')
test=#
ALTER TABLE ITEM DETACH PARTITION item3; we can also remove or detach the child partition from parent partitions.
LIMITATIONS of Declarative Partitions
We have to manually create exclusion constraints on each partition. There is no command to create all partitions at onces
We have manually created unique constraints to maintain the uniqueness in each partition.
Partitions Pruning -Performance features;;;
your table or partition tables is only scanned when it's only required during query execution. system know if this data does n't is this partition so it won't scan it
We can enable and disable the feature, but by default it's ON.
set enable_partition_pruning='off'
test=# explain analyze select * from item where sales >= 20 and logdate > '2006-04-01';
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on item2 item (cost=0.00..1.04 rows=1 width=12) (actual time=0.012..0.013 rows=1 loops=1)
Filter: ((sales >= 20) AND (logdate > '2006-04-01'::date))
Rows Removed by Filter: 2
Planning Time: 0.509 ms
Execution Time: 0.027 ms
(5 rows)
test=# set enable_partition_pruning ='off';
SET
test=# explain analyze select * from item where sales >= 20 and logdate > '2006-04-01';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Append (cost=0.00..2.10 rows=2 width=12) (actual time=0.041..0.043 rows=1 loops=1)
-> Seq Scan on item1 item_1 (cost=0.00..1.04 rows=1 width=12) (actual time=0.029..0.030 rows=0 loops=1)
Filter: ((sales >= 20) AND (logdate > '2006-04-01'::date))
Rows Removed by Filter: 3
-> Seq Scan on item2 item_2 (cost=0.00..1.04 rows=1 width=12) (actual time=0.011..0.011 rows=1 loops=1)
Filter: ((sales >= 20) AND (logdate > '2006-04-01'::date))
Rows Removed by Filter: 2
Planning Time: 0.524 ms
Execution Time: 0.059 ms
(9 rows)
test=#
Declarative partitioning best practices:
- Selection on primary key column or column which can divide logically
- Used column used mostly used in where clause
- Too less partition means index remain large
- Too many partitions can increase the planning time.
- carefully choose subpartition
- verify nature of data
Constraints Exclusion:
(Scan the Constraint of all child tables and exclude child table which doesn't match the where clause it query)
similar to partition pruning , only it will work during query planning where partition puring works at execution time.