On JUL 13, 2023, between 13:48 and 16:05 UTC, Atlassian customers using Bitbucket Cloud were experiencing degraded performance for git operations. The event was triggered by a bug that was deployed to production. The changes included the introduction of a bug that bypassed a critical cache for git operations which impacted all Bitbucket Cloud customers. The incident was detected within 10 minutes by automated monitoring and mitigated by rolling back the code changes and the redeployment of some services which put Atlassian systems into a known good state. The total time to resolution was about 2 hours & 17 minutes.
The overall impact was between JUL 13, 2023, 13:48 UTC and JUL 13, 2023, 16:05 UTC on Bitbucket Cloud products. The Incident caused service disruption to all customers where they experienced slow response times or failures when interacting with repository data. As a result of network saturation causing connections to queue, customers experienced increased latency and error rates across Bitbucket Cloud services.
The issue was caused by a change to a feature flag that contained a bug which bypassed a critical cache. As a result, more requests were directly accessing the disks which led to increased latency and eventually, degraded the performance of some operations.
We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue wasn’t identified because the impact was subtle and took approximately 48 hours to surface after initial deployment. This slow ramping was not picked up by our automated continuous test scripts because it requires significant load to reach the tipping point where systems start to become degraded.
We are prioritizing the following improvement actions to avoid repeating this type of incident:
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Atlassian Customer Support