How to debug and resolve ADM-HA DB Streaming broken issues

How to debug and resolve ADM-HA DB Streaming broken issues

book

Article ID: CTX327923

calendar_today

Updated On:

Description

1. /var/mps/db_pgsql/data/pg_hba.conf file corruption,

this issue occurs when the above-mentioned file is missing the primary or secondary node IP addresses. 

The following is a sample pg_hba.conf file for ADM version 12.1 60.16 and 13.0 64.35 with correct entries: 

   

Resolution 

Follow the steps below to resolve this issue 

If the ADM version is greater than 12.1 60.16 and 13.0 64.35, do the following: 

  1. Log in to ADM Primary node shell with username as nsrecover.
  2. Run the following command:  
    • echo “ hostssl replication masrepuser Missing_ADM_IP/32 cert clientcert=1”>> /var/mps/db_pgsql/data/pg_hba.conf 
    • echo “ hostssl mpsdb pgrewind Missing_ADM_IP /32 cert clientcert=1”>> /var/mps/db_pgsql/data/pg_hba.conf 
  3. Reload the pgsql using the following command: 
    • su -l mpspostgres -c 'sh /mps/scripts/pgsql/reloadpgsql.sh' 
  4. Follow the steps from this article: https://support.citrix.com/article/CTX239798 or click on the DB sync button (System-Deployment) applicable on ADM version is greater than 13.0 67.39 

 The following is a sample pg_hba.conf file for ADM version 12.1 60.16 and 13.0 64.35 with correct entries: 

 

Resolution

Follow the steps below to resolve this issue 

If the ADM version is less than 12.1 60.16 and 13.0 64.35, do the following: 

  1. Log in to ADM Primary node shell with username as nsrecover.
  2. Run the following command:
    1. echo “hostssl replication masrepuser Missing_ADM_IP/32 md5 “>> /var/mps/db_pgsql/data/pg_hba.conf 
  3. Reload the pgsql using the following command: 
    1. su -l mpspostgres -c 'sh /mps/scripts/pgsql/reloadpgsql.sh' 
  4. Follow the steps from this article: https://support.citrix.com/article/CTX239798  or click on the DB sync button (System-Deployment) applicable on ADM version is greater than 13.0 67.39 

2. Disk size mismatch between primary and secondary ADM-HA nodes 

This issue occurs when the disk sizes of the ADM HA primary and secondary nodes are different.  

Resolution

Maintain the same disk size between both the nodes 

With the help of the below command, we can compare the size of both the nodes. 

  1. Log in to ADM Shell Prompt 
  2. Execute the command “df -h” 

3. ADM version mismatch between primary and secondary ADM-HA node 

This issue occurs when the ADM HA primary and secondary nodes have different versions. 

Resolution

  1. Maintain the same build version between both the nodes 
  1. With the help of the below command, we can check the version of the node  
  1. Log in to the ADM Shell prompt  
  2. Execute the command as “cat /mps/version.conf” 

4. Disk-Full issues  

This issue comes when there is no disk space left in either of the ADM-HA nodes. 

Resolution

Increase the disk size by attaching additional disk. Follow the steps from the link below to add an additional disk: 

If the Secondary disk has been attached to ADM, please check the file size of /var and /var/mps , below is the sample output of ADM with and without secondary disk attached. 

 

 

5. Split-Brain scenarios  

This issue occurs when both ADM HA nodes act as a primary node. Sometimes, ADM tries to recover on its own. If it fails to do so for any reason, ADM DB streaming gets broken. 

Resolution

Follow the steps from this article: https://support.citrix.com/article/CTX239798  or click on the DB sync button (System-Deployment) applicable on ADM version is greater than 13.0 67.39 

6. Insufficient hardware  

This issue occurs when the hardware requirement mismatches for the deployed scenarios. 

Resolution

Follow the steps in the link below: 

7. Secondary node of ADM HA shows out of service in the GUI 

The issue occurs when /mpsconfig/mas_hb_monit.conf file gets corrupted. 

The following is a correct sample image of this issue: 

 
Ensure the secondary node IP is mentioned as “peer_ip”. And specify the node IPs in their respective primary and secondary node shells. Assign virtual IP as a floating IP address in the ADM HA.  

Resolution

Do the following steps: 

  1. Log in to Primary and secondary ADM Shell with username as nsrecover. 
  2. Verify the /mpsconfig/mas_hb_monit.conf  file if peer Ip and virtual Ip are missing. 
  3. Edit the same with the correct value. 
  4. Follow the steps from this article: https://support.citrix.com/article/CTX239798 or click on the DB sync button (System-Deployment) applicable on ADM version is greater than 13.0 67.39 

8. ‘mas_hb_monit’ process not able to receive /send heartbeat 

This issue occurs when the process named as “mas_hb_monit” get stuck. 

Resolution

Do the following steps: 

  1. Log in to ADM Primary Shell with username as nsrecover. 
  2. Run the following command to get the <PID>: 
  3. ps -ax | grep mas_hb_monit. 
  4. Kill -9 <PID> of mas_hb_monit process. The <PID> that is obtained from the previous step. 
  5. Wait for some time, it automatically starts the mas_hb_monit process with new <PID>. 

9. Network Issue between the two nodes  

This issue occurs when there is an actual network disturbance between two ADM-HA nodes. Verify if IPs are reachable between the node of ADM –HA.  

10. SSL certificate expires (only Applicable for 13.0 64.35) 

Expired DB SSL certificates can also cause the database streaming issue between the ADM HA nodes. If the SSL certificate expires, and the "join_streaming_replication.sh" command does not restore the streaming. This issue appears in ADM 13.0 64.35. 

Resolution

Please follow steps in https://support.citrix.com/article/CTX328037

11. Track the DB sync status after clicking the DB Sync Button (System-Deployment) 

  1. Log in to ADM Secondary shell with username as nsrecover.
  2. View the file named as /var/mps/log/pg_basebackup_**.log 

Note

Missing_ADM_IP --> This IP Belongs to either Primary or Secondary Node IP. 

pg_basebackup_**.log--> In this ** belongs to the date.