Aurora Postgres Rolling Buffer Data Loss

Hi All,

I’m trying to reduce as much data loss as possible during a few different scenarios. Wondering if anyone has been able to get loss less historical data during store and forward events or during refresh/rebirths - particularly with Aurora Postgres? Below is my scenario which includes my understandings I’m hoping to get corrected if wrong.

I’m working with redundant Ignition gateways with MQTT Transmission, a pair of Chariot Brokers working as a Set, redundant Ignition backend gateways with MQTT Engine using to Aurora Postgres for tag history, and a pair of Ignition frontend gateway for Perspective.

If I force shutdown the master backend master gateway, I see some data loss while the backup gateway becomes active. The same happens if I block comms or disable MQTT Engine. Following this article, Minimizing data loss when using MQTT Store and Forward - MQTT Modules for Ignition 8.1 - Confluence, it seems data loss is to be expected as even reducing the keep alive time could still result in 8 seconds of data loss. I have a primary host ID configured, so I believe the keep alive time of Engine is the driving factor and the keep alive time of Transmitter is not used.

By enabling the rolling buffer in transmitter, the 8 seconds data loss should be overcome assuming the rolling buffer is at least 2x keep alive time. As referenced in this article, MQTT Transmission History Store - Rolling History Buffer - MQTT Modules for Ignition 8.1 - Confluence, the rolling buffer may send duplicate data, and this may cause errors recoding data to history, but data should not be lost. Unfortunately, it seems the errors for me are putting all historical data to quarantine and none are going to DB. Ie, essentially no data is being backfilled after the connection is restored when the rolling buffer is enabled.

I did some testing on a simpler local setup and found that MS SQL seemed to perform almost perfectly in terms that the data would still be recorded in the DB. However, data also went to quarantine, so I’m assuming this would grow and grow over time.

I also tested locally with a normal Postgres DB and got data loss as if the rolling buffer was disabled. This also returned quarantined items. I haven’t found how to repeat it, but I was occasionally losing small bits of data in the middle of the backfill.

I have also realised I’m getting some data losses when selecting the refresh Transmitter tag. Not sure if there are any options overcoming this item?

If you made it this far, thanks for reading!

Ignition v8.1.48, Engine/Transmission 4.0.29

An example of enabling/disable MQTT Engine with Aurora Postgres and MQTT Engine Keep Alive set at 5s.

Rolling buffer had some issues in 4.0.29. Specifically, it published BIRTHs using metric timestamps with a value of the reconnect time, which in turn caused MQTT Engine to throw out all of the history because it is older than ‘now’. If you upgrade to 4.0.30, you should see an improvement.

We have dev/test/prod environments. The above screenshot was taken in Test. Dev still has 4.0.29 throughout, but it looks like MQTT Transmitter in test had already been updated and was already running 4.0.30. But I did just update MQTT Engine to 4.0.30. Still get the same results though.

For anyone stumbling across this thread in the future.
Cirrus Link helped identify some extra connection properties used for Postgres databases connections to help stop duplicate data (which is a natural byproduct when using the rolling buffer) blocking non-duplicate data going to the database.
This was noted here Configuring history on MQTT Engine tags - MQTT Modules for Ignition 8.1 - Confluence