I recently had a customer who was trying to figure out why his
VACUUM-for-wraparound was taking so long–he had a 6TB database, and the
VACUUM had been running for over 4 days. It’s unfortunate that there’s no progress meter when
VACUUMs run (especially when autovacuum kicked off a VACUUM in the background), but with a little help from
strace, and some wrangling with simple Bash commands like
ps, we can infer the progress and activity going on while
VACUUM is running.
Without too much detail, it’s important to know that a
VACUUM in PostgreSQL consists of 3 phases:
- Ingest Phase – Dead tuples are read into
maintenance_work_memis exhausted, the
VACUUMproceeds with the next two phases, then comes back to this phase and picks up where it left off.
- Pruning Phase – Index entries pointing to dead tuples are removed.
- Cleanup Phase – Actual line pointers between item pointers and tuples in a data page are removed
If you think about a
VACUUM in these terms, we can infer some behavioral patterns for each phase. Moreover, depending on which phase a
VACUUM is in, getting impatient and hitting
Ctrl-C may be counter-productive and somewhat risky.
When trying to figure out what a VACUUM is doing at the moment, it’s very important to have the process ID of the
psql session or
vacuumdb process. You can easily get that with:
In this example, the
VACUUM command is being run by a
psql session with PID 14404. Keep this PID in handy so that you can figure out which files are being touched by this process.
Deciphering the Data
Note that Phase 1 (the Ingest Phase) is all about reading data into memory. It’s a scan of each file related to a table (if the on-disk footprint is greater than 1GB, a file gets split up into 1GB segments). As such, if you’re watching
vmstat, you’ll probably see a fair amount of activity in the
bi column of the
io section–you may also see some iowait (the
wa column under the
cpu section). The
cache column of the
memory section should steadily increase up to the point of your
maintenance_work_mem setting (plus whatever was in the cache prior to starting the
VACUUM process). The
free column of the
memory section should steadily decrease as well.
Once all a table’s dead tuples are read into memory (or when
maintenance_work_mem gets exhausted), the
VACUUM process moves on to Phase 2 (the Pruning Phase). Here, you’ll see write activity against an index’s file(s). You should see some activity in
bo of the
io column as this is happening. To determine that it’s disk activity on an index, you’ll need to perform a brief
strace against the processID of the
VACUUM session. You’ll also need a mapping of indexes to tables, which you can get with the following query:
Note: this does not map to TOAST tables and their indexes (this is left as an exercise for the reader).
Once you’ve got a sampling of
strace output, you can proceed to figure out which by mapping the file descriptor in
strace to the file linked in
/proc/<vacuum_processID>. The file it’s pointing to–if it’s an index, you’re on Phase 2.
In the above
strace example, 515 and 508 are file descriptors. Do
ls -l /proc/14404/ | grep 515 and
ls -l /proc/14404/ | grep 508 to see which files are actually being accessed.
VACUUM is done with Phase 2, it moves on to Phase 3 (the Cleanup Phase), where it cuts the links between items and tuples on a disk page. If you don’t recall what a disk page looks like, an image is provided below. Note the red lines–Phase 3 removes the red lines and pink boxes (only if they’re marked as dead), and updates the green box to point to an available yellow box.
Phase 3 is the most significant part of the
VACUUM process, and actually completes the work of freeing up space. It’s imperative not to abort this phase (and it’s probably not good to abort Phase 2 either), as you’ll have to re-do a lot of the legwork accomplished in the earlier phases.
To see if you’re on Phase 3, you’ll see a lot of write activity in
vmstat. On a server with nothing else running besides Postgres, you should see high numbers in the
bo column, and nearly zero in the
bi column of the
io section. Once Phase 3 is done,
VACUUM will move on to the next table (or continue on the current table if Phase 1 filled up
maintenance_work_mem). Once all tables are
VACUUM process is done.
That’s about it, in a nutshell. Note that this is all that happens for a regular
VACUUM ANALYZE will perform an
ANALYZE after each table is done getting
VACUUMed. Also, for a
VACUUM FULL, the behavior is slightly different, since the indexes get discarded, and the entire file gets re-written in a new location. Perhaps those details may be explained in a future post.