Bitbucket Cloud website performance degraded
Incident Report for Atlassian Bitbucket
Postmortem

Summary

On February 22, 2024, between 7:22 UTC and 13:30 UTC, Atlassian customers using Bitbucket Cloud faced degradation to its website and APIs. This was caused by the vacuum process not being run frequently enough on our high-traffic database tables, which impaired the database’s ability to handle requests. This resulted in connection pools becoming saturated, response times increasing, and a ramp-up of requests timing out completely.

After the database recovered at 13:30 UTC, Bitbucket Pipelines experienced build scheduling delays as it processed the backlog of jobs. Additional resources were added to Bitbucket Pipelines and the backlog was cleared in full by 17:30 UTC.

IMPACT

Customers who were impacted experienced significant delays with running Bitbucket Pipelines and increased latency when accessing the bitbucket.org website and APIs during the duration of the incident. Git requests over HTTPS and SSH were unaffected.

ROOT CAUSE

The incident was caused by an issue during the routine autovacuuming of our active database tables, which impaired its ability to serve requests. This led to slowdowns that impacted a variety of Bitbucket services, including the queuing of a large backlog of unscheduled pipelines.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We know that outages impact your productivity. We are prioritizing the following improvement actions to reduce recovery time, limit impact, and avoid repeating these types of incidents in the future:

  • Reconfigure vacuuming threshold for high write activity database tables.
  • Adjust alert thresholds to proactively catch this behavior earlier and reduce potential impact.
  • Tuning autoscaling and load shedding behavior for Pipelines services and increasing build runner capacity.

We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.

Thanks,

Atlassian Customer Support

Posted Mar 12, 2024 - 17:58 UTC

Resolved
This incident has been resolved.
Posted Feb 22, 2024 - 13:57 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Feb 22, 2024 - 13:39 UTC
Update
We are continuing to investigate the cause of the degraded performance.
Posted Feb 22, 2024 - 11:56 UTC
Update
We are continuing to investigate this issue.
Posted Feb 22, 2024 - 10:13 UTC
Update
We are continuing to investigate the cause of the degraded performance.
Posted Feb 22, 2024 - 10:08 UTC
Investigating
We are aware of an incident impacting the performance of the Bitbucket Cloud website. An update will be provided soon.
Posted Feb 22, 2024 - 08:37 UTC
This incident affected: Website, API, Webhooks, and Pipelines.