Issue seen with Cloud connectors and DaaS failed intermittently, resulting in users losing their session and not being able to log back in.
Bug fix
The DDC VM became unhealthy when our auto-heal service was interrupted by a race condition where it tried to restart the DDC when it was in the middle of a provisioning operation, at the time auto-heal was enabled for both DDC instances and container instances. In some of the provisioning/upgrade cases, the auto-heal service would get triggered by specific condition when the DDC was not ready to be monitored yet, so when it triggered it interrupted the provisioning scripts and left the DDC in an unstable state. While some of the teams were alerted at the time, there was no immediate remediation because the 2nd DDC should have taken over which led to the second problem.
The secondary DDC did not take over brokering of the launch and machine creation and admin operations while the other DDC was unavailable. This issue was triggered again because of the interruption of the provisioning scripts, since the tags were not set completely, the secondary DDC was not load balancing any requests from the load balancers.
Both of these issues have been addressed and implemented to ensure this same issue does not arise again. The issues were tracked under XAC-61347 and is fixed in Cloud release 126.
Issue seen with Cloud connectors and DaaS failed intermittently, resulting in users losing their session and not being able to log back in.