EDB Replication Server empowers DBAs with high-performance Multi-Master Replication (MMR) with the help of Postgres’ Logical Decoding framework. While performance is a signifiance improvement from the xDB 5.x trigger-based replication architecture, DBAs and sysadmins still seek to monitor and track replication lag. There are a number of ways to do this, and we’ll go over them.
Row Replication Lag
Much like using
pg_stat_replication when employing Postgres’ built-in Streaming Replication, the most accurate form of replication lag monitoring is to see how many bytes (in this case, rows) a standby/target databse is behind its provider. The way to measure this is by executing the following query on the control databse (in most cases, the MDN):
Note that this is the same query that Postgres Enterprise Manager’s (PEM) XDB Replication Probe uses. If you’re already using PEM, you can just enable the XDB Replication Probe, and you’ll be able to monitor the lag from the PEM client.
Time Replication Lag
For most people, they’re not interested in the number of bytes or rows a replication cluster is lagging behind. Instead, they’re more interested in how long it takes for ALL the unreplicated data to get to the other side. For this, there’s no real way to get a measurement without having a third-party monitoring to see if a change in one server has appeared in another server. Moreover, in an MMR situation, there are complexities in defining replication lag in terms of wallclock time, since all nodes are Master Databases.
One way to measure time lag in an EDB Replication Server cluster involves a few tricks, and the setup looks like this:
- A table dedicated to testing replication performance (we’ll call it
- A table tabulating the time lag history (we’ll call it
- A function returning a trigger (named
- A trigger on the
First, we create the
Then, we add it to the publication (in this case, I’m creating a new publication, since it’s a new cluster in my environment):
After we verify that replication is working in both directions, create the
xdb_lag_history table, trigger and function on the MDN:
It is very important to note the
ALTER TABLE statement that sets
ENABLE ALWAYS TRIGGER – this prevents the DML coming from the non-MDN from being ignored by the trigger. Without this trigger enabled, you’ll never get any data inserted into
INSERT a row into
xdb_lag_test on any of your non-MDNs:
Wait a few seconds for the data to replicate into the MDN, and then go check:
In this system, there was a 4.36sec lag in propagating the
INSERT statement. Now, we can write to all the non-MDNs and see how much time it takes for data to replicate into the MDN. From here, other instrumentation can be made to track time lag in EDB Replication Server.