Report for Bloated Tables in Postgres and manual mitigation steps

Manually running vacuum Jobs in postgres

Determining what needs to be vacuumed

This report will organize tables by schema in order to plan manual run of vacuuming jobs. The bloat report will only report on table who's bloat is greater than 25% and total storage reclamation is rolled up per schema in order to see where the most bloat is per schema. Actual table sizing information is provided in order to estimate how long these vacuums jobs may take.

select
	'&ltdatabase name&gt' as db,	
	coalesce(nspname,'Grand Totals',nspname) AS "schema",
	coalesce(nspname || '.' || relname,
	'----------------------------------------',
	nspname || '.' || relname) as table_name,
	pg_size_pretty(sum(pg_total_relation_size(nspname || '.' || relname))) as actual_size,
	  pg_relation_size(nspname || '.' || relname) a,
	pg_size_pretty(sum(pg_total_relation_size(nspname || '.' || relname))) as bloat_size,
	pg_size_pretty(sum(pg_total_relation_size(nspname || '.' || relname) - pg_relation_size(nspname || '.' || relname))) as reclaim_size_pretty,
	case
		when sum(pg_total_relation_size(nspname || '.' || relname)) > 0
    then (sum(pg_total_relation_size(nspname || '.' || relname) - pg_relation_size(nspname || '.' || relname)) * 100 / sum(pg_total_relation_size(nspname || '.' || relname)))::integer
		else 0
	end as bloat_percentage,
	sum(pg_total_relation_size(nspname || '.' || relname) - pg_relation_size(nspname || '.' || relname)) as reclaim_size,
	'VACUUM FULL VERBOSE ANALYZE ' || nspname || '.' || relname || ';' "sql"
from
	pg_class c
join pg_namespace n on
	c.relnamespace = n.oid
where
	relkind = 'r'
	and nspname not like 'pg_%'
	and nspname != 'information_schema'
	and reltuples > 0
	and (pg_total_relation_size(nspname || '.' || relname) - pg_relation_size(nspname || '.' || relname)) * 
		100 / pg_total_relation_size(nspname || '.' || relname) > 25
--	and pg_relation_size(nspname || '.' || relname) > 10033438720 -- specific size filter
group by
	rollup(nspname,
	relname)
having
	grouping(nspname) = 0
	or grouping(relname) = 1
order by
	nspname,
	case
		when grouping(relname) = 0 then 0
		else 1
	end,
	sum(pg_total_relation_size(nspname || '.' || relname)) asc;

Creating SQL Scripts to perform vacuuming runs

Using above report you can organize you vacuum statements into scripts for each schema, the smaller and shorter schema statements can be combined into a single script file to reduce your amount of keyboard time.

vacuum_<environment>_<schema>_<devops ticket #>.sql
vacuum_other_schemas_<devops ticket #>.sql

file name examples:

vacuum_prod_other_stage_omop_49924.sql
vacuum_source_epic_clarity_49924.sql
vacuum_prod_omop_prod_49924.sql

Preparing devops task

Attaching and placement of SQL scripts

Once the scripts are created simply attach them to your devops tasks and using desktop SMB client create a working directory and also place them on storage1.

create working directory via SMB:

Create a directory named after your ticket number in the devops folder for the environment you're working in via your SMB explorer or client.

\\storage1.ris.wustl.edu\i2\Active\rdc2_etl\<environment>\devops\<devops ticket #>

Creating Execution Plans for Vacuuming

Buy logging in the RIS compute cluster and using the Lsub script for postgres, the run job command can be used to quickly generate the commands needed execute the sql scripts that are created above.

Using the Lsub job for postgres log into the desired environment and execute run_job for each vacuum script.

NOTE:
The ouput of run_job is not actually executed but is the display of what should be executed in order to run the specific sql script against the postgres environment.

cd ${RDC_DEVOPS}/<devops ticket>
-- or if this is a version release
cd ${RDC_REPOS}/<specific git repo>

run_job <connection index> <scrript file> <devops ticket #>

The output of the command (execution plan) can be copied to your devops ticket's description (do not execute anything outputted by run_job without appropriate approvals).

Example run_job command run:

cd /storage1/fs1/I2/Active/rdc2_etl/prod/devops/49924

run_job 3 ./vacuum_prod_other_stage_omop.sql 49924
${PG_STRING} = -abe postgresql://rdc_etl_user@postgresql.prod.i2rdc3.wustl.edu:5432/omop_dw?sslmode=require&keepalives=1&keepalives_idle=600 --set=env=prod --set=ON_ERROR_STOP=1 --set=serv=postgresql.prod.i2rdc3.wustl.edu
${RDC_LOGS} = /storage1/fs1/I2/Active/rdc2_etl/prod/logs
execute:
set_connection 3
time psql ${PG_STRING} -f ./vacuum_prod_other_stage_omop.sql > $RDC_LOGS/vacuum_prod_other_stage_omop_49924_20241214-225216.log 2>&1
log_test ${RDC_LOGS}/vacuum_prod_other_stage_omop_49924_20241214-225216.log

Additional Notes:

If the task you are running is in either QA or PRod environments, Approvals are needed and must be requested before execution.

Data Sources

Platforms

Teams

Report for Bloated Tables in Postgres and manual mitigation steps

Manually running vacuum Jobs in postgres

Determining what needs to be vacuumed

Creating SQL Scripts to perform vacuuming runs

Preparing devops task

Attaching and placement of SQL scripts

Creating Execution Plans for Vacuuming

Table of Contents

Report for Bloated Tables in Postgres and manual mitigation steps

Manually running vacuum Jobs in postgres

Determining what needs to be vacuumed

Creating SQL Scripts to perform vacuuming runs

Preparing devops task

Attaching and placement of SQL scripts

Creating Execution Plans for Vacuuming

Related devops tickets

Table of Contents