Repositories are not accessible
Incident Report for Atlassian Bitbucket
Postmortem

SUMMARY

On Aug 02, 2021 between 08:00 AM - 10:50 AM PST a subset of customers on Atlassian’s Cloud Platform using Bitbucket were unable to access their repositories. The event was triggered when applying a configuration change while adding additional storage capacity. The changes included incorrect configuration by one of our vendors which impacted 6% of our storage capacity. The incident was detected within 6 minutes by Automated Monitoring system and mitigated by rolling back the incorrect configuration and redeploying the affected services which put Atlassian systems into a known good state. The total time to resolution was about 2 hours & 52 minutes.

IMPACT

Customers with repositories in any of the unavailable storage were unable to access their repos for the duration of the incident.

The overall impact was between Aug 02, 2021, 08:00 AM PST and Aug 02, 2021, 10:50 AM PST on Bitbucket Cloud. The Incident caused service disruption to 6% of our storage capacity only, causing affected customers to be unable to access repositories.

ROOT CAUSE

The issue was caused by an incorrect configuration change made by one of our vendors. As a result, the affected Bitbucket Cloud customers could not access their repositories, and the users received HTTP 504 errors. More specifically, we were adding additional storage capacity and the misconfiguration involved a repurposed IP address, causing those volumes to become inaccessible. The root cause of the incident was the failure in the detection of the misconfiguration.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We know that outages are impactful to your productivity. While we have a number of testing and preventative processes in place, this specific issue wasn’t identified because the change was related to a very specific kind of change related to the onboarding of additional storage capacity.

We are prioritizing the following improvement actions to avoid repeating this type of incident -

  • Implementing a solution so that storage volumes are mounted with a preferred IP address and are able to failover to a pool of secondary IP addresses when the primary one becomes unavailable.
  • Implementing a detection and alerting mechanism that validates each storage volume is available.
  • Working with our external vendor on preventative measures as well as improved detection on their end

We apologize to those customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.

Thanks,

Atlassian Customer Support

Posted Aug 24, 2021 - 17:30 UTC

Resolved
The issue has been resolved and the service is operating normally.
The root cause was identified as the wrong configuration when trying to onboard additional storage capacity for Bitbucket Cloud. This resulted on a subset of storage capacity to become unavailable. The problem got resolved after we rolled back the misconfiguration and Bitbucket services were restarted.
Posted Aug 02, 2021 - 18:20 UTC
Monitoring
We have identified the root cause of the repositories accessibility and have mitigated the problem. We are now monitoring closely.
Posted Aug 02, 2021 - 17:52 UTC
Identified
We have identified the faulty service that is causing the issue and are working on restoring all repositories accessibility. We will provide another update shortly
Posted Aug 02, 2021 - 17:32 UTC
Update
We are still investigating the issue with some repositories being unavailable.
We are getting close to identifying the root cause and addressing it.
For repositories that are affected, all services will be unaccessible
Posted Aug 02, 2021 - 16:58 UTC
Update
We are still investigating the issue with some repositories being unavailable.
We are getting close to identifying the root cause and addressing it.
Posted Aug 02, 2021 - 16:48 UTC
Update
We have identified that multiple components are currently affected and looking into addressing it as a priority.
Posted Aug 02, 2021 - 15:37 UTC
Investigating
We are investigating an issue with Bitbucket Cloud repositories not being accessible over UI or command line that is impacting Bitbucket Cloud hosted repositories . We will provide more details within the next hour.
Posted Aug 02, 2021 - 15:35 UTC
This incident affected: Website, API, Git via SSH, Git via HTTPS, Webhooks, Source downloads, Pipelines, and Git LFS.