Troubleshooting "Database streaming channel is broken between HA nodes" error on ADM HA pair

book

Article ID: CTX239798

calendar_today

Updated On:

Description

The issue started occurring after the upgrade.



This issue is only applicable to ADM devices in HA on 12.1 and 13.0 versions.
 
 

Resolution

Starting from ADM on-prem 13.0 71.x release, if database streaming between the nodes in an HA deployment fails, now you can click the Sync Database tab under System > Deployment > High Availability Deployment in the ADM GUI, to restore the database.

Alternatively if you are on version less than 13.0 71.x, then use the following procedure:

  • Access CLI of Primary ADM using username as nsrecover and password of nsroot user. Run below commands:
# cd /var/mps/db_pgsql/data/pg_log
# chown mpspostgres *
  • Access CLI of Secondary ADM using username as nsrecover and password of nsroot user:
  • Run the below command:
if ADM is below 12.1-59.x, 13.0-64.x ! (Replace with original IP addresses in the command):
# nohup sh /mps/scripts/pgsql/join_streaming_replication.sh SecondaryIP PrimaryIP nsroot > /var/mps/log/join_streaming_replication_console.log 2>&1 &
if ADM version is 12.1-59.x , 13.0-64.x or later
# nohup sh /mps/scripts/pgsql/join_streaming_replication.sh SecondaryIP PrimaryIP > /var/mps/log/join_streaming_replication_console.log 2>&1 &
  • Monitor the output of the above command in /var/mps/log:
# tail -f /var/mps/log/join_streaming_replication_console.log
  • Wait for a few hours and confirm if the HA channel is UP by running the command on Secondary:
# ps -ax | grep -i wal
  • You should see this line to confirm if the channel is UP
??  Ss     0:14.14 postgres: wal receiver process   streaming

Problem Cause

After upgrade, sometimes POSTGRES process on Secondary fails to start resulting in DB sync issues.

Issue/Introduction

The error is seen on ADM GUI when they login to the device.