Troubleshooting "Database streaming channel is broken between HA nodes" error on ADM HA pair

book

Article ID: CTX239798

calendar_today

Updated On:

Description

The issue started occurring after the upgrade.

This issue is only applicable to ADM devices in HA on 12.1 and 13.0 versions.

Resolution

Starting from ADM on-prem 13.0 71.x release, if database streaming between the nodes in an HA deployment fails, now you can click the Sync Database tab under System > Deployment > High Availability Deployment in the ADM GUI, to restore the database.

Alternatively if you are on version less than 13.0 71.x, then use the following procedure:

Access CLI of Primary ADM using username as nsrecover and password of nsroot user. Run below commands:

# cd /var/mps/db_pgsql/data/pg_log

# chown mpspostgres *

Access CLI of Secondary ADM using username as nsrecover and password of nsroot user:
Run the below command:

if ADM is below 12.1-59.x, 13.0-64.x ! (Replace with original IP addresses in the command):

# nohup sh /mps/scripts/pgsql/join_streaming_replication.sh SecondaryIP PrimaryIP nsroot > /var/mps/log/join_streaming_replication_console.log 2>&1 &

if ADM version is 12.1-59.x , 13.0-64.x or later

# nohup sh /mps/scripts/pgsql/join_streaming_replication.sh SecondaryIP PrimaryIP > /var/mps/log/join_streaming_replication_console.log 2>&1 &

Monitor the output of the above command in /var/mps/log:

# tail -f /var/mps/log/join_streaming_replication_console.log

Wait for a few hours and confirm if the HA channel is UP by running the command on Secondary:

# ps -ax | grep -i wal

You should see this line to confirm if the channel is UP

??  Ss     0:14.14 postgres: wal receiver process   streaming

Problem Cause

After upgrade, sometimes POSTGRES process on Secondary fails to start resulting in DB sync issues.

Issue/Introduction

The error is seen on ADM GUI when they login to the device.

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"