Troubleshooting Upgrade Performance
If you are experiencing slow performance issues during the upgrade procedure, the following checks can be used to troubleshoot them.
- Ensure that the system is not idle and there is at least one process is consuming your CPU by 100%:
top
outputs 100% for some processes.top
(orps
) outputs processes in “R” state, but there is no “D” state.
- Ensure that the disk system performance is ok on the idle system:
top
does not output processes in “D” state.top
“wa” is not higher than 0.5 * 100/$NUM_OF_CPUs.- If it is, run
iotop
to find any processes that consume I/O.
- If it is, run
- Ensure that RAM utilization is ok:
cat /proc/meminfo
displays that more than 70% is occupied by “Cached”.
-
Make sure there are no suspicious open connections to the database.
select pid, datname, usename, client_addr, backend_start, query_start, state, query last_query from pg_stat_activity where pid != pg_backend_pid() order by datname, usename, query_start
If the query returns any result, it is worth investigating all connection sources and stopping or removing them to avoid resource concurrency, which can lead to an increased upgrade duration. We recommend stopping all connection sources using standard tools or commands, but if this cannot be done, you can interrupt connections forcibly by passing
<pid>
to functionpg_terminate_backend
:select pg_terminate_backend(<pid>);
-
Check database sessions:
select pid, datname, usename, client_addr, backend_start, now() - query_start query_runtime, case when wait_event_type is null then 'false' else 'true' end as locked, state, wait_event_type from pg_stat_activity where state != 'idle' and pid != pg_backend_pid() order by datname, state;
If the upgrade session is not listed, it means that the upgrade process is hanging for some other reason, not because of the database.
select bda.pid blocked_pid, bda.usename blocked_user, bda.application_name blocked_app_name, bdd.datname blocked_db, bdc.relname blocked_relation, coalesce(bda.client_addr||':'||bda.client_port, 'socket?') blocked_addr, coalesce(bda.client_hostname, 'socket?') blocked_host, now()-bda.query_start blocked_duration, bda.query blocked_query, bdl.locktype blocked_locktype, bdl.mode blocked_lockmode, coalesce(bitrl.pid, bitl.pid) block_pid, coalesce(bitra.usename, bita.usename) block_user, coalesce(bitra.application_name, bita.application_name) block_app_name, coalesce(bitra.client_addr||':'||bitra.client_port, bita.client_addr||':'||bita.client_port, 'socket?') block_addr, coalesce(bitra.client_hostname, bita.client_hostname, 'socket?') block_host, now()-coalesce(bitra.query_start, bita.query_start) block_last_activity, coalesce(bitra.query, bita.query) block_last_query, coalesce(bitra.state, bita.state) block_state, coalesce(bitrl.locktype, bitl.locktype) block_locktype, coalesce(bitrl.mode, bitl.mode) block_lockmode, coalesce(bitrl.granted, bitl.granted) block_granted from pg_stat_activity bda join pg_locks bdl on bdl.pid = bda.pid and not bdl.granted left join pg_database bdd on bdd.oid = bdl.database left join pg_class bdc on bdc.oid = bdl.relation left join pg_locks bitrl on bitrl.transactionid = bdl.transactionid and bitrl.pid != bdl.pid left join pg_stat_activity bitra on bitra.pid = bitrl.pid left join pg_locks bitl on bitl.database = bdl.database and bitl.relation = bdl.relation and bitl.pid != bdl.pid left join pg_stat_activity bita on bita.pid = bitl.pid where bda.wait_event_type is not null --and bda.pid = <pid>;
If true, the blocker can be identified.
-
Check whether there are long running database queries.
If the upgrade session is not blocked, check whether a heavy query is running. If the query runs for more than 5 minutes (fieldquery_runtime
):- Check whether any HW performance problems exist.
- If the query runs for more than half an hour, gather data and involve experts:
Get query text by running the following query:
select query from pg_stat_activity where pid = <pid>
- Get an execution plan for the obtained query by running
EXPLAIN
<query_text>.