On November 11, 2025, between 16:25 and 19:13 UTC, Atlassian customers were unable to access Bitbucket Cloud services. Customers experienced a period of 1 hour and 16 minutes where performance was degraded and a period of 1 hour and 32 minutes where the Bitbucket Cloud website, APIs, and Git hosting were unavailable. The event was triggered by a code change that unintentionally impacted how we evaluate feature flags, impacting all customers. The incident was detected within 5 minutes by automated monitoring systems and mitigated by scaling multiple services and deploying a fix which put Atlassian systems into a known good state. The total time to full resolution was about 2 hours and 48 minutes.
The overall impact was between November 11, 2025, 16:25 UTC and November 11, 2025, 19:13 UTC on Bitbucket Cloud. Between 16:25 UTC and 16:50 UTC, users were seeing degraded experiences with both Git services and pull request experiences within the Bitbucket Cloud site. Starting at 16:50 UTC, users were unable to access Bitbucket Cloud and associated services entirely.
During a routine deployment, a code change had a negative impact on a component used for feature flag evaluation. To mitigate this issue the Bitbucket engineering team manually scaled up Git services. This inadvertently resulted in hitting a regional limit with our hosting provider, causing new Git service instances to fail. This ultimately led to degradation of multiple dependent services and an increased number of failed requests via Bitbucket Cloud’s website and public APIs.
Our team immediately began investigating the issue and testing various mitigations, including scaling the impacted services, in an effort to reduce the effects of the change. However, these efforts were unsuccessful due to an unexpected scaling limit imposed by our underlying hosting platform. Attempts to roll back the code change were also unsuccessful, as the platform’s scaling limit prevented new infrastructure from being provisioned during the rollback process. In particular, any attempts to provision new infrastructure caused a high volume of calls to occur in a short period, leading to failures, retries, and a feedback loop that worsened the situation.
To address this, the team scaled down certain services to reduce load on the platform, which allowed for the successful deployment of a fix and restoration of service. Once the fix was in place, healthy services were scaled back up to meet customer demand.
We recognize the significant impact outages have on our customers’ productivity. Despite our robust testing and preventative measures, this particular issue related to feature flag evaluation was not detected in other environments and only became apparent under high load conditions that had not previously occurred. The incident has provided valuable information about our hosting platform’s scaling limits, and we are actively applying these learnings to enhance our resilience and response times.
To help prevent similar incidents in the future, we have taken the following actions:
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Thanks,
Atlassian Customer Support