Orchestrator is unavailable

Postmortem

Customer impact

The Orchestrator service running in UiPath Automation Cloud Public Sector had an outage. Jobs could not be started from 8:00 pm UTC until 8:30 pm UTC.

Root cause

The Orchestrator team was deploying upgrades to Kubernetes and other third-party components to provide fix potential security vulnerabilities. One step in the upgrade pipeline failed and caused connection issues with Orchestrator. The automatic failover did not trigger as it should, leading to customer impact.

Detection

Our synthetic traffic failures triggered alerts immediately.

Response

We fixed the problem by retrying the failed deployment step.

Follow-up

  • We are still investigating why the upgrade step failed. Once this is identified, we will improve the upgrade system to avoid similar failures.
  • We are adding automatic retries to this step.
  • We will fix the automatic failover mechanism. Once fixed, we will perform regular failover drills.
Posted Feb 28, 2025 - 19:21 UTC

Resolved

Orchestrator went down at 12:10 PM PST due to deployment and recovered after deployment has been completed at 12:35 PM PST.
Posted Feb 26, 2025 - 20:57 UTC

Investigating

Orchestrator is inaccessible, returning 500 error
Posted Feb 26, 2025 - 20:37 UTC
This incident affected: Orchestrator, Action Center, and Apps.