Maxing out max_parallel_workers in Postgres

February 22, 2023 - Postgres

There are a number of articles which discuss how to force the postgres query planner into a certain amount of parallelism (1, 2). Today I was trying to get a query to run with up to 64 parallel workers (for reasons), but ran into something which the other articles didn't mention explicitly.

To recap, there are a number of postgres parameters which affect parallelism:

max_worker_processes: A server-wide limit on the number of processes which can be created for parallelism. This must be configured in postgresql.conf, and requires a restart to be applied.
max_parallel_workers: This is a global limit on how much parallelism is possible in the database system at once (across all simultaneously-executing queries).
max_parallel_workers_per_gather: This effectively sets an upper limit on parallelism per query (this is probably not entirely correct - presumably it's possible to have multiple gathers running in parallel, but I haven't experimented with this).
parallel_leader_participation: This determines whether the leader process will participate in the parallel execution of a query.
parallel_tuple_cost: A planner cost estimate on how expensive it is to pass tuples from the parallel worker to the leader.
parallel_setup_cost: A planner cost estimate on how expensive it is to start up a worker process for a parallel query.
min_parallel_table_scan_size: The minimum table size to begin using a parallel scan.

Note: throughout this post I'm assuming that max_worker_processes is already set to be greater or equal to whatever we configure max_parallel_workers or max_parallel_workers_per_gather to.

The easiest way to enforce parallelism (in this case 8-way) is to set:

SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 8;
SET parallel_leader_participation = off;
SET parallel_tuple_cost = 0; 
SET parallel_setup_cost = 0;
SET min_parallel_table_scan_size = 0;

This effectively tells the planner "don't worry, parallelism never costs anything" (which is not strictly true), and "please use up to 8 workers".

We can set these settings in a connection, and validate that they're working with the following:

CREATE TABLE data(val INTEGER);
INSERT INTO data (val) SELECT * FROM generate_series(1, 1000000) as val;

Now run:

EXPLAIN (ANALYZE) SELECT sum(val) FROM data;

                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=5987.53..5987.54 rows=1 width=8) (actual time=20.987..24.387 rows=1 loops=1)
   ->  Gather  (cost=5987.50..5987.51 rows=8 width=8) (actual time=20.923..24.381 rows=8 loops=1)
         Workers Planned: 8
         Workers Launched: 8
         ->  Partial Aggregate  (cost=5987.50..5987.51 rows=1 width=8) (actual time=15.873..15.874 rows=1 loops=8)
               ->  Parallel Seq Scan on data  (cost=0.00..5675.00 rows=125000 width=4) (actual time=0.007..7.945 rows=125000 loops=8)
 Planning Time: 0.055 ms
 Execution Time: 24.413 ms
(8 rows)

So far, nothing new - we're just rehashing what others have said.

At this point, you may run into a wall. This is determined by the number of disk pages used by the table (relpages in postgres-speak). The actual parallelism can be determined by taking the logarithm (base 3) of relpages for the table being scanned (for more details, take a look at the postgres sources). In the example that I constructed above, we didn't notice this wall.

To see the wall, let's now set:

SET max_parallel_workers = 16;
SET max_parallel_workers_per_gather = 16;

EXPLAIN (ANALYZE) SELECT sum(val) FROM data;

                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=5987.53..5987.54 rows=1 width=8) (actual time=21.166..24.509 rows=1 loops=1)
   ->  Gather  (cost=5987.50..5987.51 rows=8 width=8) (actual time=21.099..24.501 rows=8 loops=1)
         Workers Planned: 8
         Workers Launched: 8
         ->  Partial Aggregate  (cost=5987.50..5987.51 rows=1 width=8) (actual time=16.004..16.004 rows=1 loops=8)
               ->  Parallel Seq Scan on data  (cost=0.00..5675.00 rows=125000 width=4) (actual time=0.008..7.929 rows=125000 loops=8)
 Planning Time: 0.051 ms
 Execution Time: 24.535 ms
(8 rows)

Hmmm... There it is. It seems as though we can't get more than 8 parallel workers.

We can run the following query to understand the current upper limit on parallel workers for our table:

SELECT ceil(log(3, relpages)) FROM pg_class WHERE relname = 'data';

 ceil
------
    8
(1 row)

Note: relpages is an approximation, and is updated when an ANALYZE is run on the table. If you're following along and get an error like ERROR: cannot take logarithm of zero, you need to run VACUUME ANALYZE data; and then rerun the query above.

To get over this, we can override the number of parallel workers on a per-table basis, by setting:

ALTER TABLE data SET (parallel_workers = 64);

Now let's max it out:

SET max_parallel_workers = 64;
SET max_parallel_workers_per_gather = 64;

EXPLAIN (ANALYZE) SELECT sum(val) FROM data;

                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=4620.48..4620.49 rows=1 width=8) (actual time=47.768..49.222 rows=1 loops=1)
   ->  Gather  (cost=4620.31..4620.32 rows=64 width=8) (actual time=14.635..49.192 rows=64 loops=1)
         Workers Planned: 64
         Workers Launched: 64
         ->  Partial Aggregate  (cost=4620.31..4620.32 rows=1 width=8) (actual time=1.957..1.958 rows=1 loops=64)
               ->  Parallel Seq Scan on data  (cost=0.00..4581.25 rows=15625 width=4) (actual time=0.003..1.000 rows=15625 loops=64)
 Planning Time: 0.051 ms
 Execution Time: 49.251 ms
(8 rows)

Whether this is actually sensible or not is an entirely different matter, but if you were wondering how it's done - now you know!