Quantcast
Channel: partitions – Cloud Data Architect
Viewing all articles
Browse latest Browse all 413

Beena Emerson: Benchmark Partition Table – 1

$
0
0

Feed: Planet PostgreSQL.

With the addition of declarative partitioning in PostgreSQL 10, it only made sense to extend the existing pgbench benchmarking module to create partitioned tables. A recent commit of patch by Fabien Coelho in PostgreSQL 13 has made this possible.


The pgbench_accounts table can now be partitioned with –partitions and –partition-method options which specify the number of partitions and the partitioning method accordingly when we initialize the database.


pgbench -i –partitions [–partition-method ]

partitions : This must be a positive integer value
partition-method : Currently only range and hash are supported and the default is range.

pgbench will throw an error if the –partition-method is specified without a valid –partitions option.

For range partitions, the given range is equally split into the specified partitions. The lower bound of the first partition is MINVALUE and the upper bound of the last partition is MAXVALUE. For hash partitions, the number of partitions specified is used in the modulo operation.

Test Partitions

I performed a few tests using the new partition options with the following settings:
  • pgbench scale = 5000 (~63GB data + 10GB indexes)
  • pgbench thread/client count = 32
  • shared_buffers = 32GB
  • min_wal_size = 15GB
  • max_wal_size = 20GB
  • checkpoint_timeout=900
  • maintenance_work_mem=1GB
  • checkpoint_completion_target=0.9
  • synchronous_commit=on


The hardware specification of the machine on which the benchmarking was performed is as follows:
  • IBM POWER8 Server
  • Red Hat Enterprise Linux Server release 7.1 (Maipo) (with kernel Linux 3.10.0-229.14.1.ael7b.ppc64le)
  • 491GB RAM
  • IBM,8286-42A CPUs (24 cores, 192 with HT)



Two different types of queries were tested:
  1. Read-only default query: It was run using the existing -S option of pgbench.

  2. Range query: The following custom query which searches for a range that is 0.002% of the total rows was used. 

set v1 random(1, 100000 * :scale)
set v2 :v1 + 1000000
BEGIN;
SELECT abalance FROM pgbench_accounts WHERE v1 BETWEEN :v1 AND :v2;
END;


Tests were run for both range and hash partition types. The following table shows the median of three tps readings taken and the tps increase in percentage when compared to the non-partitioned table. 

Read-only Default Query
Range Query
non-partitioned
323331.60
35.36
partitions
range
hash
range
hash
tps
increase
tps
increase
tps
increase
tps
increase
100
201648.82
-37.63 %
208805.45
-35.42 %
36.92
4.40 %
35.31
-0.16 %
200
189642.09
-41.35 %
199718.17
-38.23 %
37.63
6.42 %
34.34
-2.90 %
300
191242.31
-40.85 %
203182.88
-37.16 %
38.33
8.38 %
34.01
-3.82 %
400
186329.88
-42.37 %
189118.42
-41.51 %
49.43
39.78 %
34.86
-1.44 %
500
189727.31
-41.32 %
195470.47
-39.54 %
48.39
36.83 %
33.19
-6.13 %
600
185143.62
-42.74 %
191237.48
-40.85 %
45.42
28.44 %
32.42
-8.32 %
700
179190.37
-44.58 %
178999.73
-44.64 %
42.18
19.29 %
32.57
-7.91 %
800
170432.79
-47.29 %
173027.42
-46.49 %
45.82
29.57 %
31.38
-11.28 %

Read-only Default Query
In this type of OLTP point query, we are selecting only one row. Internally, an index scan is performed on the pgbench_accounts_pkey for the value being queried. In the non-partitioned case, the index scan is performed on the only index present. However, for the partitioned case, the partition details are collected and then partition pruning is carried out before performing an index scan on the selected partition. 
As seen on the graph, the different types of partitions do not show much change in behavior because we would be targeting only one row in one particular partition. This drop in performance for the partitioned case can be attributed to the overhead of handling a large number of partitions. The performance is seen to slowly degrade as the number of partitions is increased.

Range Custom Query

In this type of query, one million rows which are about 0.002% of the total entries are targeted in sequence. In the non-partitioned case, the singular primary key is searched for all of the given range. As in the previous case, for the partitioned table, partitioning pruning is attempted before the index scan is performed on the smaller indexes of the selected partitions. 

Given the way the different partition types sort out the rows, the given range being queried will only be divided amongst at most two partitions in the range type but it would be scattered across all the partitions for hash type. As expected the range type fares much better in this scenario given the narrowed search being performed. The hash type performs worse as it is practically doing a full index search, like in the non-partitioned case, along with bearing the overhead of partition handling.
We can discern that range partitioned tables are very beneficial when the majority of the queries are range queries. We have not seen any benefit for hash partitions in these tests but they are expected to fare better in certain scenarios involving sequential scans. We can conclude that the partition type and other partition parameters should be set only after thorough analysis as the incorrect implementation of partition can tremendously decrease the overall performance.  

I want to extend a huge thank you to all those who have contributed to this much essential feature which makes it possible to benchmark partitioned tables – Fabein Coelho, Amit Kapila, Amit Langote, Dilip Kumar, Asif Rehman, and Alvaro Herrera.

  
This blog is also published on postgresrocks.


Viewing all articles
Browse latest Browse all 413

Trending Articles