The Most-Neglected Postgres Feature?
Introduction
I recently did some work with a customer who had some strange behavior happening in his database. When I asked for his logs, I found that each line had a message, and just one timestamp prefixed to it. In other words, he had log_line_prefix = '%t '
. This made it hard for me to figure out who did what, especially as his database was serving many clients and applications. It got me thinking, and I scoured through our other clients’ postgresql.conf
files that had been shared with me over the years, and in ~140 conf files, I found the following:
- 5% of users don’t change
log_line_prefix
- 7% of users don’t log a timestamp (but they might be using syslog, which would include its own timestamp)
- 38% only log a timestamp (and nothing else)
- The average number of parameters included in
log_line_prefix
is0.93
A bit of history
Wait a minute. On average less than one parameter in log_line_prefix
for any postgresql.conf
? How could that be? Bear in mind that prior to v. 10, the default for log_line_prefix
was simply ''
. That’s right–nothing. It was up to the DBA to set a value (as is often the case with the default conf which, understandably, caters to the lowest common denominator). Seeing that this wasn’t very useful, Christoph Berg submitted a patch to set the default to '%m [%p] '
. While it’s not the best setting, it’s a significant improvement to nothing at all. What this does tell me though, is that many users out there using v. 9.x have not bothered to change log_line_prefix
at all, making this perhaps one of the most neglected important features PostgreSQL has to offer.
EDIT: Some of these conf files were from EDB Postgres Advanced Server (EPAS) deployments. EPAS has been shipping with log_line_prefix = '%t '
by default since 2012, so those 38% of users who log only a timestamp are users who don’t change log_line_prefix
, possibly making the statistic more like “43% of users don’t bother to change log_line_prefix
.”
More important than some may think
Adequate logging opens up the door to many possibilities. With log_connections
and log_disconnections
, you can see when a session began and ended. With log_min_duration_statement
(along with auto_explain
), you can identify any poorly-running queries. With log_autovacuum_min_duration
, you can see what an autovacuum job did, how much space it freed up, and perhaps tip you off to any stray/idle transactions preventing you from vacuuming more. Same goes with log_temp_files
, which can tip you off to any work_mem
adjustments you may need. However, in order of any of this to be possible, log_line_prefix
needs to be adequately set.
log_line_prefix
can log many important facets of a session or query. There are 17 parameters that can be logged and while not all of them need to be included in your postgresql.conf
, here are some of my favorites:
%a
- Application Name - Allows quick reference and filtering%u
- User Name - Allows filter by user name%d
- Database Name - Allows filter by database name%r
- Remote Host IP/Name - Helps identify suspicious activity from a host%p
- Process ID - Helps identify specific problematic sessions%l
- Session/Process Log Line - Helps identify what a session has done%v
/%x
- Transaction IDs - Helps identify what queries a transaction ran
These, along with a timestamp (%m
or %t
) make it possible for a DBA or developer to quickly filter on specific paramters to identify issues and collect historical data. Moreover, log analytics tools like pgbadger
work best with a more comprehensive log_line_prefix
, so something like log_line_prefix = '%m [%p:%l] (%v): host=%r,db=%d,user=%u,app=%a,client=%h '
is what I like to use. With this, I can do the following:
grep
on[<pid>]
to find lines pertaining to a specific active backend to see what it’s done so far (cross referencing withSELECT * FROM pg_stat_acitivity
)grep
on a PID to see what it did before the database crashed- Filter out lines with a specific application name (like
autovacuum
– because I know for a fact thatautovacuum
didn’t cause the problem I’m currently trying to investigate) - Filter out a specific database name because in my multi-tenant setup, I don’t need to worry about that database
- Focus on a specific transaction ID to help my developer know at which step in his code a query is failing
- And more …
Without setting log_line_prefix
, all you have is a bunch of timestamps and a bunch of queries or error messages, not knowing how it relates with the set of users and applications that might be accessing the database. Setting log_line_prefix
will lead to quicker diagnosis, faster incident resolution, happier users, and rested DBAs.
That’s not all!
While setting log_line_prefix
is very important, from experience I also think the following are important for DBAs to maintain their sanity:
log_min_duration_statement
– helpful in identifying slow querieslog_statement
– good for auditing purposeslog_[dis]connections
– good for auditing purposeslog_rotation_age
/log_rotation_size
– good for organization, and for keeping your logfiles small(ish)log_autovacuum_min_duration
– gives insight into autovacuum behaviorlog_checkpoints
– know what queries happened between checkpointslog_temp_files
– helps identify work_mem shortages, I/O spikesauto_explain
– not a parameter, but a useful extension
These parameters will help with diagnosis and in some cases, when coupled with pgbadger
, pganalyze
, or tail_n_mail
, can assist with capacity planning.
How much is too much?
Some may complain that logging too many things will lead to I/O overhead and actually increase time for diagnosis. DBAs don’t want to be drinking from a firehose at 3AM! While this may be true, some steps can be taken to mitigate these effects:
- Rotate your logfiles every hour (or more frequently on super-busy systems)
- Compress your logfiles after they’ve been rotated
- Put your
log_directory
on a separate partition
Getting familiar with commandline tools like grep
, sed
, and awk
are also very important, so that you can quickly filter and zoom in on the suspected users, transactions, and processes.
Conclusion
While a company’s bottom line is often correlated with efficiency, performance, and throughput, there’s no excuse for inadequate logging. Good logging saves valuable time, and time is money. PostgreSQL’s logging is very powerful and informative, even without third-party log processing tools like Datadog and Splunk. It’s the first and most powerful resource (along with looking at pg_stat_activity
) that DBAs have when it comes to figuring out what caused problems in your application stack, and the tell-all snitch when it comes to investigating a database crash. Don’t neglect it!