On July 13, 2023 between 4:04 AM UTC and 6:42 AM UTC, Atlassian customers using Bitbucket Cloud were unable to retrieve a list of branches. The event was triggered by a change in the rate limiting for that endpoint, causing them to be applied globally. The incident was detected within 5 minutes by automated monitoring and mitigated by reverting the change responsible which put Atlassian systems into a known good state. The total time to resolution was about 2 hours & 38 minutes.
Everywhere in Bitbucket Cloud that retrieved a list of repository branches was affected which includes the pull request creation pages and API as well as pipelines builds. The total time that customers experienced this impact was approximately 2.5 hours.
The issue was caused by a change to the repository branches list endpoint. As a result, users making calls to retrieve their list of branches received HTTP 429 errors. More specifically a change to the branches list endpoint caused the endpoint to incorrectly apply the repository level rate limit globally. The root cause of the incident was the failure in the detection of the bug by the deployment validations.
We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue wasn’t identified because the change was related to a very specific kind of legacy case that was not picked up by our automated continuous deployment suites and manual test scripts.
We are prioritizing the following improvement actions to avoid repeating this type of incident:
To minimize the impact of breaking changes to our environments, we will implement additional preventative measures such as:
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Thanks,
Atlassian Customer Support