Troubleshooting "Database streaming channel is broken between HA nodes" error on ADM HA pair

Troubleshooting "Database streaming channel is broken between HA nodes" error on ADM HA pair

book

Article ID: CTX239798

calendar_today

Updated On:

Description

The issue started occurring after the upgrade.



This issue is only applicable to ADM devices in HA on 12.1 and 13.0 versions.
 
 

Resolution

Starting from ADM on-prem 13.0 71.x release, if database streaming between the nodes in an HA deployment fails, now you can click the Sync Database tab under System > Deployment > High Availability Deployment in the ADM GUI, to restore the database.

Alternatively if you are on version less than 13.0 71.x, then use the following procedure:

  • Access CLI of Primary ADM using username as nsrecover and password of nsroot user. Run below commands:
# cd /var/mps/db_pgsql/data/pg_log
# chown mpspostgres *
  • Access CLI of Secondary ADM using username as nsrecover and password of nsroot user:
  • Run the below command:
if ADM is below 12.1-59.x, 13.0-64.x ! (Replace with original IP addresses in the command):
# nohup sh /mps/scripts/pgsql/join_streaming_replication.sh SecondaryIP PrimaryIP nsroot > /var/mps/log/join_streaming_replication_console.log 2>&1 &
if ADM version is 12.1-59.x , 13.0-64.x or later
# nohup sh /mps/scripts/pgsql/join_streaming_replication.sh SecondaryIP PrimaryIP > /var/mps/log/join_streaming_replication_console.log 2>&1 &
  • Monitor the output of the above command in /var/mps/log:
# tail -f /var/mps/log/join_streaming_replication_console.log
  • Wait for a few hours and confirm if the HA channel is UP by running the command on Secondary:
# ps -ax | grep -i wal
  • You should see this line to confirm if the channel is UP
??  Ss     0:14.14 postgres: wal receiver process   streaming

Problem Cause

After upgrade, sometimes POSTGRES process on Secondary fails to start resulting in DB sync issues.

Issue/Introduction

The error is seen on ADM GUI when they login to the device.