When you run the HA failover command in an Amazon Web Service (AWS) high availability environment, the failover might not occur. This article explains two such scenarios and its resolution.
After configuring a high availability setup in an AWS environment, the failover process requires the NetScaler VPX instance to make contact with REST API server. The communication is done using IAM user account that is associated with the NetScaler VPX instance. Sometimes, the IAM user might not have appropriate permissions and this results in a situation without any response and/or failure to failover.
For example, when initiating a high availability failover on the secondary appliance, /var/log/ns.log and /var/log/cloud-ha-daemon.log might record the following error messages:
Mar 2 02:36:49 <local0.info> ns ptpd: nsnet_read: select FD=4 failed with 4: Interrupted system callMar 2 02:36:49 <local0.info> ns last message repeated 2 times Mar 2 02:36:49 <local0.info> ns ptpd: protocol error: 4 - Interrupted system call Mar 2 02:36:49 <local0.info> ns ptpd: nsnet_read: select FD=4 failed with 4: Interrupted system call Mar 2 02:36:50 <local0.info> ns last message repeated 2 times Mar 2 02:36:50 <local0.info> ns ptpd: protocol error: 4 - Interrupted system call Mar 2 02:36:50 <local0.info> ns ptpd: nsnet_read: select FD=4 failed with 4: Interrupted system call Mar 2 02:36:50 <local0.info> ns last message repeated 2 times Mar 2 02:36:50 <local0.info> ns ptpd: protocol error: 4 - Interrupted system call Mar 2 02:36:50 <local0.info> ns ptpd: nsnet_read: select FD=4 failed with 4: Interrupted system call Mar 2 02:36:50 <local0.info> ns last message repeated 2 times Mar 2 02:36:50 <local0.info> ns ptpd: protocol error: 4 - Interrupted system call Mar 2 02:36:50 <local0.info> ns ptpd: nsnet_read: select FD=4 failed with 4: Interrupted system call Mar 2 02:36:51 <local0.info> ns last message repeated 2 times Mar 2 02:36:51 <local0.info> ns ptpd: protocol error: 4 - Interrupted system call Mar 2 02:36:51 <local0.info> ns ptpd: nsnet_read: select FD=4 failed with 4: Interrupted system call Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG 2 interfaces will move Primary from instance i-9e5c23ed to i-7cf18e0f Mar 2 02:36:51 <local0.notice> 10.217.245.106 03/02/2013:02:36:49 GMT 0-PPE-0 : EVENT STATECHANGE 114 0 : Device "self node 10.217.245.106" - State Primary (Remote node - ACTIVE, UP) Mar 2 02:36:51 <local0.alert> 10.217.245.106 03/02/2013:02:36:51 GMT 0-PPE-0 : PITBOSS1 Message 115 0 : "Sat Mar 2 02:36:49 2013 PB_OP_CHANGE_POLICY new policy 0x28b5 (10421)" Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG Detaching Id eni-1276217f attachId eni-attach-5aaa4d31 Mar 2 02:36:51 <local0.notice> 10.217.245.106 03/02/2013:02:36:51 GMT 0-PPE-0 : EVENT DEVICEUP 117 0 : Device "server_svc_NSSVC_HTTP_127.0.0.1:80(internal)" - State UP Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG Final Call AWSAccessKeyId=AKIAI4MDH73LWHPYWNBQ&Signature=DJq%2Fl8Cc9ub1EBIbiOrA8pTOV6MX8xw6CwGKDyse7I4%3D&Version=2012-06-15&Timestamp=2013-03-02T02%3A36%3A51Z&Action=DetachNetworkInterface&AttachmentId=eni-attach-5aaa4d31&Force=True&SignatureMethod=HmacSHA256&SignatureVersion=2 len 268 Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG AWS COMMON... status 403 res 0 failed 1 Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG AWS API request failed ret = 403 Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG retrying.... 2 Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG AWS COMMON... status 403 res 0 failed 1 Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG AWS API request failed ret = 403 Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG retrying.... 1 Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG AWS COMMON... status 403 res 0 failed 1 Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG AWS API request failed ret = 403 Mar 2 02:36:51 <local0.info> ns awsconfig: AWSCONFIG Failed AWS API DetachNetworkInterface due to 403
The 403 response failed typically indicates a problem with IAM user account having permissions issue while contacting REST API. The permission must be set as per ICG for an IAM user associated with the NetScaler VPX instance.
The following list indicates the IAM user permissions required for high availability failover to work:
If the Access Key and Secret Key are different between the appliances that are part of the high availability setup, the HA failover does not work.
Citrix Documentation - Understanding the Causes of Failover