Config sync may fail after upgrade in HA/Cluster deployments

Config sync may fail after upgrade in HA/Cluster deployments

book

Article ID: CTX322863

calendar_today

Updated On:

Description

After upgrading to 13.0 74.14+ from older releases sometimes it was observed that config sync is continuously failing in HA/Cluster deployments. Failure can be because of multiple reasons like:
 

  1. Internal user login is disabled but ns_comm_key is not configured
  2. Ssh_host_rsa_key private and public is not appropriate
  3. Rpc node password mismatch among nodes
  4. Disk space issue or some unknow reason which is not yet identified

Go through the article, https://support.citrix.com/article/CTX214822 to identify if the issue was caused due to case 1 (listed above)

If the issue is not caused due to case 1 then check for case 2. This article discusses case 2.
  • To enhance the secure way of config sync “strichostkeychecking” is now enabled while making ssh connection to the Primary/CCO node.
  • CCO/Primary node’s ssh_host_rsa_key.pub file content was fetched on a secure channel.
  • This public key is used by NON-CCO/Secondary node to authenticate CCO/Primary node identity using stricthostkeychecking.
  • This will ensure NON-CCO/Secondary node is establishing a connection with the right CCO/Primary node.
  • In the case of a man-in-middle attack, configysnc will fail because the public key sent by the attacker will not match with the fetched public key.
  • But if CCO/Primary itself doesn’t have the correct entry in ssh_host_rsa_key.pub then this strichostkeychecking will fail which lead to configsync failure continuously.

Identify issue:

Run below command in CCO/Primary node to identified if both public and private keys are correct:
$  ssh-keygen -y -f  /nsconfig/ssh/ssh_host_rsa_key
Above command will print public key and this public sent by the CCO/Primary node to the NON-CCO/Secondary during ssh connection. Match the output of the above command with public content in “/nsconfig/ssh/ssh_host_rsa_key.pub”. This should be the same. Like below:
$ ssh-keygen -y -f  /nsconfig/ssh/ssh_host_rsa_key

ssh-rsa 
AAAAB3NzaC1yc2EAAAADAQABAAABAQC/mUBEWQVdgwEAyF1/RyN6ZS0WVM45brXptqk95pHu6dF41LDLByCT4PH5mqhWhneh6nsdS3XcgPTCgtZP7R4XDh4X3xJ9tje26VCCDcFApB9OGXxdTSqS0X+vVwLLW9uX93xpxfNRCI7pJqflhe3xCgjE6SN6wfgLfnj+7xfikkbxDUF5KSXJglHHz1jkLbSX2fCMEnv6KxhMUBDefGpY9QXqr1Ea62qaigOnMcknC5kbZE3oeaEiUYw25YfdqDE70yUx8rnZmbEMdQT3ldedRprKSiTVpubwFmiHDGJKp/LBOowZvZy/zFCwJdvBZRcePRcpFiMHxc+l0e2cKyVN
$ cat /nsconfig/ssh/ssh_host_rsa_key.pub

ssh-rsa 
AAAAB3NzaC1yc2EAAAADAQABAAABAQC/mUBEWQVdgwEAyF1/RyN6ZS0WVM45brXptqk95pHu6dF41LDLByCT4PH5mqhWhneh6nsdS3XcgPTCgtZP7R4XDh4X3xJ9tje26VCCDcFApB9OGXxdTSqS0X+vVwLLW9uX93xpxfNRCI7pJqflhe3xCgjE6SN6wfgLfnj+7xfikkbxDUF5KSXJglHHz1jkLbSX2fCMEnv6KxhMUBDefGpY9QXqr1Ea62qaigOnMcknC5kbZE3oeaEiUYw25YfdqDE70yUx8rnZmbEMdQT3ldedRprKSiTVpubwFmiHDGJKp/LBOowZvZy/zFCwJdvBZRcePRcpFiMHxc+l0e2cKyVN root@ns

If it’s matching then configsync is failing due to some other reasons and needs to be explored for the root cause. But if it is not matching then ssh_host_rsa_key keys need to be regenerated.

Resolution

  1. Before regenerating the keys delete the existing keys “/nsconfig/ssh/ssh_host_rsa_key” and “/nsconfig/ssh/ssh_host_rsa_key.pub”.
  2. Then run the below command to regenerate the keys.
    • $ ssh-keygen -t rsa -f /nsconfig/ssh/ssh_host_rsa_key
  3. Skip the ‘passphrase’ when prompt for it during key generation.