GitHub Enterprise Cloud - EU Status - Incident History

EU - Incident with Pull Requests

Wed, 09 Apr 2025 09:31:53 +0000

Apr 9, 09:31 UTC
Resolved - On April 9, 2025, between 7:01 UTC and 9:31 UTC, the Pull Requests service was degraded and failed to update refs for repositories with higher traffic. This was due to a repository migration creating a larger than usual number of enqueued jobs. This resulted in an increase in job failures and delays for non-migration sourced jobs.

We declared an incident once we confirmed that this issue was not isolated to the migrating repository and other repositories were also failing to process ref updates.

We mitigated the incident by shifting the migration jobs to a different job queue.

To avoid problems like this in the future, we are revisiting our repository migration process and are working to isolate potentially problematic migration workloads from non-migration workloads.

Apr 9, 09:00 UTC
Investigating - We are investigating reports of degraded performance for Pull Requests

Incident With Dependabot

Tue, 01 Apr 2025 16:42:03 +0000

Apr 1, 16:42 UTC
Resolved - This incident has been resolved.

Apr 1, 16:41 UTC
Update - On 3/28/2025 between 6:02:44 and 8:18:31 PM UTC, Dependabot installations in the EU GitHub Enterprise Cloud failed due to unexpected side effects from a change. We mitigated the issue by rolling back the change and recommend reinstalling Dependabot as a workaround for any inconsistent states. Meanwhile, we are implementing measures to reduce detection and mitigation times for future incidents.

Mar 28, 20:35 UTC
Update - We have identified the root cause and are now rolling out mitigations to the affected regions.

Mar 28, 18:26 UTC
Investigating - Dependabot is down. We are currently investigating for mitigation

EU - Disruption with Pull Request Ref Updates

Fri, 28 Mar 2025 01:40:37 +0000

Mar 28, 01:40 UTC
Resolved - Between March 27, 2025, 23:45 UTC and March 28, 2025, 01:40 UTC the Pull Requests service was degraded and failed to update refs for repositories with higher traffic activity. This was due to a large repository migration that resulted in a larger than usual number of enqueued jobs; while simultaneously impacting git fileservers where the problematic repository was hosted. This resulted in an increase in queue depth due to retries on failures to perform those jobs causing delays for non-migration sourced jobs.

We declared an incident once we confirmed that this issue was not isolated to the problematic migration and other repositories were also failing to process ref updates.

We mitigated the issue by stopping the migration and short circuiting the remaining jobs. Additionally, we increased the worker pool of this job to reduce the time required to recover.

As a result of this incident, we are revisiting our repository migration process and are working to isolate potentially problematic migration workloads from non-migration workloads.

Mar 28, 01:40 UTC
Update - This issue has been mitigated and we are operating normally.

Mar 28, 00:54 UTC
Update - We are continuing to monitor for recovery.

Mar 28, 00:20 UTC
Update - We believe we have identified the source of the issue and are monitoring for recovery.

Mar 27, 23:52 UTC
Investigating - We are investigating reports of degraded performance for Pull Requests

EU - Claude 3.5 Sonnet model is unavailable in Copilot

Mon, 24 Feb 2025 22:14:08 +0000

Feb 24, 22:14 UTC
Resolved - On February 24, 2025 between 21:42 UTC and 22:14 UTC the Claude 3.5 Sonnet model for GitHub Copilot Chat experienced degraded performance. During the impact, all requests to Claude 3.5 Sonnet would result in an immediate error to the user. This was due to misconfiguration within one of our infrastructure providers that has since been mitigated.

We are working to prevent this error from occurring in the future by implementing additional failover options. Additionally we are updating our playbooks and alerting to reduce time to detection.

Feb 24, 22:14 UTC
Update - We were able to quickly identify the problem and resolve this issue. Claude 3.5 Sonnet is available again.

Feb 24, 22:08 UTC
Update - At this time, we are unable to serve requests to the Claude 3.5 Sonnet on Copilot. No other models are affected. We are investigating the issue and will provide updates as we discovery more information.

Feb 24, 22:06 UTC
Investigating - We are investigating reports of degraded performance for Copilot

EU - Claude Sonnet unavailable in GitHub Copilot

Wed, 12 Feb 2025 23:10:49 +0000

Feb 12, 23:10 UTC
Resolved - On February 12th, 2025, between 21:30 UTC and 23:10 UTC the Copilot service was degraded and all requests to Claude 3.5 Sonnet were failing. No other models were impacted.

This was due to an issue with our upstream provider which was detected within 12 minutes, at which point we raised the issue to our provider to remediate. GitHub is working with our provider to improve the resiliency of the service.

Feb 12, 23:10 UTC
Update - Claude Sonnet is fully available in GitHub Copilot again. If you used an alternate model during the outage, you can switch back to Claude Sonnet.

Feb 12, 23:04 UTC
Update - We are seeing a recovery with our Claude Sonnet model provider. We'll confirm once the problem is fully resolved.

Feb 12, 22:54 UTC
Update - Our Claude Sonnet provider acknowledged the issue. They will provide us with next update by 11:30 AM UTC / 3:30 PM PT. Claude Sonnet remains unavailable in GitHub Copilot, please use an alternate model.

Feb 12, 22:41 UTC
Update - We escalated the issue to our Claude Sonnet model provider. Claude Sonnet remains unavailable in GitHub Copilot, please use an alternate model.

Feb 12, 21:52 UTC
Investigating - We are investigating reports of degraded performance for Copilot

EU - Incident with Actions

Sat, 01 Feb 2025 16:33:09 +0000

Feb 1, 16:33 UTC
Resolved - Between January 31, 2025 19:08 UTC and Feb 1, 2025 16:33 UTC, GitHub-hosted runners for Actions experienced a disruption on GitHub Enterprise Cloud in EU and Australia regions. 100% of Actions jobs in these regions using hosted runners failed with an error indicating an internal failure when running the job.

The issue was caused by an unexpected removal of critical DNS records caused by the interaction of two unrelated configuration changes. Each change was validated during deployment, but the hosted runner impact started after deployment validation was completed. A gap in alerting led to a significant delay in detecting the impact and initiating mitigation.

We have completed monitoring improvements for earlier detection, updated the guidance on DNS changes, and are working on improvements to harden the deployment and validation of similar changes for the future.

Feb 1, 16:32 UTC
Update - We have found and mitigated the issue, actions should now be working again.

Feb 1, 15:54 UTC
Update - We're continuing to investigate the cause of the issue with Actions

Feb 1, 15:15 UTC
Update - We have detected an issue where users are unable to run actions using GitHub hosted runners.

Feb 1, 15:05 UTC
Investigating - We are investigating reports of degraded performance for Actions

EU - Incident with Webhooks

Thu, 09 Jan 2025 02:27:05 +0000

Jan 9, 02:27 UTC
Resolved - On January 9, 2025, between 01:26 UTC and 01:56 UTC GitHub experienced widespread disruption to many services, with users receiving 500 responses when trying to access various functionality. This was due to a deployment which introduced a query that saturated a primary database server. On average, the error rate was 6% and peaked at 6.85% of update requests.

We mitigated the incident by identifying the source of the problematic query and rolling back the deployment.

We are investigating methods to detect problematic queries prior to deployment to prevent, and to reduce our time to detection and mitigation of issues like this one in the future.

Jan 9, 02:19 UTC
Update - We have identified the root cause and have deployed a fix. Majority of the services have recovered. Actions service is in the process of being recovered.

Jan 9, 02:14 UTC
Update - Copilot is operating normally.

Jan 9, 02:09 UTC
Update - We have identified the root cause and have deployed a fix. Service are recovering.

Jan 9, 01:53 UTC
Investigating - We are investigating reports of degraded performance for Copilot

EU - Live updates on pages not loading reliably

Tue, 17 Dec 2024 16:00:11 +0000

Dec 17, 16:00 UTC
Resolved - On December 17th, 2024, between 14:33 UTC and 14:50 UTC, many users experienced intermittent errors and timeouts when accessing github.com. The error rate was 8.5% on average and peaked at 44.3% of requests. The increased error rate caused a broad impact across our services, such as the inability to log in, view a repository, open a pull request, and comment on issues. The errors were caused by our web servers being overloaded as a result of planned maintenance that unintentionally caused our live updates service to fail to start. As a result of the live updates service being down, clients reconnected aggressively and overloaded our servers.

We only marked Issues as affected during this incident despite the broad impact. This oversight was due to a gap in our alerting while our web servers were overloaded. The engineering team's focus on restoring functionality led us to not identify the broad scope of the impact to customers until the incident had already been mitigated.

We mitigated the incident by rolling back the changes from the planned maintenance to the live updates service and scaling up the service to handle the influx of traffic from WebSocket clients.

We are working to reduce the impact of the live updates service's availability on github.com to prevent issues like this one in the future. We are also working to improve our alerting to better detect the scope of impact from incidents like this.

Dec 17, 15:32 UTC
Update - Issues is operating normally.

Dec 17, 14:53 UTC
Update - We are currently seeing live updates on some pages not working. This can impact features such as status checks and the merge button for PRs.

Current mitigation is to refresh pages manually to see latest details.

We are working to mitigate this and will continue to provide updates as the team makes progress.

Dec 17, 14:51 UTC
Investigating - We are investigating reports of degraded performance for Issues

EU - Incident with Issues

Wed, 04 Dec 2024 19:27:35 +0000

Dec 4, 19:27 UTC
Resolved - On December 4th, 2024 between 18:52 UTC and 19:11 UTC, several GitHub services were degraded with an average error rate of 8%.

The incident was caused by a change to a centralized authorization service that contained an unoptimized database query. This led to an increase in overall load on a shared database cluster, resulting in a cascading effect on multiple services and specifically affecting repository access authorization checks. We mitigated the incident after rolling back the change at 19:07 UTC, fully recovering within 4 minutes.

While this incident was caught and remedied quickly, we are implementing process improvements around recognizing and reducing risk of changes involving high volume authorization checks. We are investing in broad improvements to our safe rollout process, such as improving early detection mechanisms.

Dec 4, 19:20 UTC
Update - Issues is operating normally.

Dec 4, 19:08 UTC
Investigating - We are investigating reports of degraded performance for Issues

Incident with Pages and Actions

Mon, 16 Sep 2024 22:08:38 +0000

Sep 16, 22:08 UTC
Resolved - On September 16, 2024, between 21:11 UTC and 22:20 UTC, Actions and Pages services were degraded. Customers who deploy Pages from a source branch experienced delayed runs. Approximately 1,100 runs were delayed long enough to get marked as abandoned. The runs that weren't abandoned completed successfully after we recovered from the incident. Actions jobs experienced average delays of 23 minutes, with some jobs experiencing delays as high as 45 minutes. During the course of the incident, 17% of runs were delayed by more than 5 minutes. At peak, as many as 80% of runs experienced delays exceeding 5 minutes. The root cause was a misconfiguration in the service that manages runner connections, which caused CPU throttling and led to a performance degradation in that service.

We mitigated the incident by diverting runner connections away from the misconfigured nodes. We are working to improve our internal monitoring and alerting to reduce our time to detection and mitigation of issues like this one in the future.

Sep 16, 21:54 UTC
Update - Actions is operating normally.

Sep 16, 21:37 UTC
Investigating - We are investigating reports of degraded availability for Actions

Disruption with some GitHub services

Thu, 29 Aug 2024 21:54:47 +0000

Aug 29, 21:54 UTC
Resolved - On August 29th, 2024, from 16:56 UTC to 21:42 UTC, we observed an elevated rate of traffic on our public edge, which triggered GitHub’s rate limiting protections. This resulted in <0.1% of users being identified as false-positives, which they experienced as intermittent connection timeouts. At 20:59 UTC the engineering team improved the system to remediate the false-positive identification of user traffic, and return to normal traffic operations.

Aug 29, 20:43 UTC
Update - While we have seen a reduction in reports of users having connectivity issues to GitHub.com, we are still investigating the issue.

Aug 29, 20:07 UTC
Update - We are continuing to investigate issues with customers reporting temporary issues accessing GitHub.com

Aug 29, 19:29 UTC
Update - We are getting reports of users who aren't able to access GitHub.com and are investigating.

Aug 29, 19:29 UTC
Investigating - We are currently investigating this issue.

Disruption with some GitHub services

Wed, 28 Aug 2024 23:43:58 +0000

Aug 28, 23:43 UTC
Resolved - On August 28, 2024, from 21:40 to 23:43 UTC, up to 25% of unauthenticated dotcom traffic in SE Asia (representing <1% of global traffic) encountered HTTP 500 errors. We observed elevated error rates at one of our global points of presence, where geo-DNS health checks were failing. We identified unhealthy cloud hardware in the region, indicated by abnormal CPU utilization patterns. As a result, we drained the site at 23:26 UTC, which promptly restored normal traffic operations.

Aug 28, 22:19 UTC
Update - We are seeing cases of user impact in some locations are continuing to investigate.

Aug 28, 22:02 UTC
Investigating - We are currently investigating this issue.

Incident with Pull Requests, Pages and Actions

Thu, 15 Aug 2024 00:30:15 +0000

Aug 15, 00:30 UTC
Resolved - On August 14, 2024 between 23:02 UTC and 23:38 UTC, all GitHub services were inaccessible for all users.

This was due to a configuration change that impacted traffic routing within our database infrastructure, resulting in critical services unexpectedly losing database connectivity. There was no data loss or corruption during this incident.

We mitigated the incident by reverting the change and confirming restored connectivity to our databases. At 23:38 UTC, traffic resumed and all services recovered to full health. Out of an abundance of caution, we continued to monitor before resolving the incident at 00:30 UTC on August 15th, 2024.

We will provide more details as our investigation proceeds and will post additional updates in the coming days.

Aug 15, 00:13 UTC
Update - Git Operations is operating normally.

Aug 14, 23:45 UTC
Update - The database infrastructure change is being rolled back. We are seeing improvements in service health and are monitoring for full recovery.

Aug 14, 23:29 UTC
Update - We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back.

Aug 14, 23:19 UTC
Update - Git Operations is experiencing degraded availability. We are continuing to investigate.

Aug 14, 23:16 UTC
Update - We are investigating reports of issues with GitHub.com and GitHub API. We will continue to keep users updated on progress towards mitigation.

Aug 14, 23:13 UTC
Investigating - We are currently investigating this issue.

Actions runs using large runners delayed for some customers

Tue, 30 Jul 2024 22:10:21 +0000

Jul 30, 22:10 UTC
Resolved - On July 30th, 2024, between 13:25 UTC and 18:15 UTC, customers using Larger Hosted Runners may have experienced extended queue times for jobs that depended on a Runner with VNet Injection enabled in a virtual network within the East US 2 region. Runners without VNet Injection or those with VNet Injection in other regions were not affected. The issue was caused due to an outage in a third party provider blocking a large percentage of VM allocations in the East US 2 region. Once the underlying issue with the third party provider was resolved, job queue times went back to normal. We are exploring the addition of support for customers to define VNet Injection Runners with VNets across multiple regions to minimize the impact of outages in a single region.

Jul 30, 22:09 UTC
Update - The mitigation for larger hosted runners has continued to be stable and all job delays are less than 5 minutes. We will be resolving this incident.

Jul 30, 21:44 UTC
Update - We are continuing to hold this incident open while the team ensures that mitigation put in place is stable.

Jul 30, 21:00 UTC
Update - Larger hosted runners job starts are stable and starting within expected timeframes. We are monitoring job start times in preparation to resolve this incident. No enqueued larger hosted runner jobs were dropped during this incident.

Jul 30, 20:17 UTC
Update - Over the past 30 minutes, all larger hosted runner jobs have started in less than 5 minutes. We are continuing to investigate delays in larger hosted runner job starts

Jul 30, 19:40 UTC
Update - We are still investigating delays in customer’s larger hosted runner job starts. Nearly all jobs are starting under 5 minutes. Only 1 customer larger hosted runner job was delayed by more than 5 minutes in the past 30 minutes.

Jul 30, 19:04 UTC
Update - We are seeing improvements to the job start times for larger hosted runners for customers. In the last 30 minutes no customer jobs are delayed more than 5 minutes. We will continue monitoring for full recovery.

Jul 30, 18:19 UTC
Investigating - We are currently investigating this issue.

Linking internal teams to external IDP groups was broken for some users between 15:17-20:44 UTC

Thu, 25 Jul 2024 21:05:02 +0000

Jul 25, 21:05 UTC
Resolved - Between July 24th, 2024 at 15:17 UTC and July 25th, 2024 at 21:04 UTC, the external identities service was degraded and prevented customers from linking teams to external groups on the create/edit team page. Team creation and team edits would appear to function as normal, but the selected group would not be linked to the team after form submission. This was due to a bug in the Primer experimental SelectPanel component that was mistakenly rolled out to customers via a feature flag.

We mitigated the incident by scaling the feature flag back down to 0% of actors.

We are making improvements to our release process and test coverage to avoid similar incidents in the future.

Jul 25, 21:04 UTC
Investigating - We are currently investigating this issue.

Incident with Copilot

Sat, 13 Jul 2024 19:27:04 +0000

Jul 13, 19:27 UTC
Resolved - On July 13, 2024 between 00:01 and 19:27 UTC the Copilot service was degraded. During this time period, Copilot code completions error rate peaked at 1.16% and Copilot Chat error rate peaked at 63%. Between 01:00 and 02:00 UTC we were able to reroute traffic for Chat to bring error rates below 6%. During the time of impact customers would have seen delayed responses, errors, or timeouts during requests. GitHub code scanning autofix jobs were also delayed during this incident.

A resource cleanup job was scheduled by Azure OpenAI (AOAI) service early July 13th targeting a resource group thought to only contain unused resources. This resource group unintentionally contained critical, still in use, resources that were then removed. The cleanup job was halted before removing all resources in the resource group. Enough resources remained that GitHub was able to mitigate while resources were reconstructed.

We are working with AOAI to ensure mitigation is in place to prevent future impact. In addition, we will improve traffic rerouting processes to reduce time to mitigate in the future.

Jul 13, 19:26 UTC
Update - Copilot is operating normally.

Jul 13, 18:01 UTC
Update - Our upstream provider continues to recover and we expect services to return to normal as more progress is made. We will provide another update by 20:00 UTC.

Jul 13, 16:09 UTC
Update - Our upstream provider is making good progress recovering and we are validating that services are nearing normal operations. We will provide another update by 18:00 UTC.

Jul 13, 11:18 UTC
Update - Our upstream provider is gradually recovering the service. We will provide another update at 23:00 UTC.

Jul 13, 03:50 UTC
Update - We are continuing to wait on our upstream provider to see full recovery. We will provide another update at 11:00 UTC

Jul 13, 03:20 UTC
Update - The error rate for Copilot chat requests remains steady at less than 10%. We are continuing to investigate with our upstream provider.

Jul 13, 02:20 UTC
Update - Copilot is experiencing degraded performance. We are continuing to investigate.

Jul 13, 02:19 UTC
Update - We have applied several mitigations to Copilot chat, reducing errors to less than 10% of all chat requests. We are continuing to investigate the issue with our upstream provider.

Jul 13, 01:32 UTC
Update - Copilot chat is experiencing degraded performance, impacting up to 60% of all chat requests. We are continuing to investigate the issue with our upstream provider.

Jul 13, 00:49 UTC
Update - Copilot chat is currently experiencing degraded performance, impacting up to 60% of all chat requests. We are investigating the issue.

Jul 13, 00:29 UTC
Update - Copilot is experiencing degraded availability. We are continuing to investigate.

Jul 13, 00:18 UTC
Update - Copilot API chat experiencing significant failures to backend services

Jul 13, 00:18 UTC
Investigating - We are investigating reports of degraded performance for Copilot

Incident with Copilot

Thu, 11 Jul 2024 15:21:18 +0000

Jul 11, 15:21 UTC
Resolved - On July 11, 2024, between 10:20 UTC and 14:00 UTC Copilot Chat was degraded and experienced intermittent timeouts. This only impacted requests routed to one of our service region providers. The error rate peaked at 10% for all requests and 9% of users. This was due to host upgrades in an upstream service provider. While this was a planned event, processes and tooling was not in place to anticipate and mitigate this downtime.

We are working to improve our processes and tooling for future planned events and escalation paths with our upstream providers.

Jul 11, 15:21 UTC
Update - Copilot is operating normally.

Jul 11, 13:02 UTC
Update - Copilot's Chat functionality is experiencing intermittent timeouts, we are investigating the issue.

Jul 11, 13:02 UTC
Investigating - We are investigating reports of degraded performance for Copilot

Incident with Issues and Pages

Mon, 08 Jul 2024 19:45:20 +0000

Jul 8, 19:45 UTC
Resolved - On July 8th, 2024, between 18:18 UTC and 19:11 UTC, various services relying on static assets were degraded, including user uploaded content on github.com, access to docs.github.com and Pages sites, and downloads of Release assets and Packages.

The outage primarily affected users in the vicinity of New York City, USA, due to a local CDN disruption.

Service was restored without our intervention.

We are working to improve our external monitoring, which failed to detect the issue and will be evaluating a backup mechanism to keep critical services available, such as being able to load assets on GitHub.com, in the event of an outage with our CDN.

Jul 8, 19:44 UTC
Update - Our assets are serving normally again and all impact is resolved.

Jul 8, 19:16 UTC
Update - We are beginning to see recovery of our assets and are monitoring for additional impact.

Jul 8, 19:01 UTC
Investigating - We are currently investigating this issue.

We are investigating degraded performance for GitHub Enterprise Importer migrations

Tue, 18 Jun 2024 18:09:44 +0000

Jun 18, 18:09 UTC
Resolved - Starting on June 18th from 4:59pm UTC to 6:06pm UTC, customer migrations were unavailable and failing. This impacted all in-progress migration during that time. This issue was due to an incorrect configuration on our Database cluster. We mitigated the issue by remediating the database configuration and are working with stakeholders to ensure safeguards are in place to prevent the issue going forward.

Jun 18, 18:04 UTC
Update - We have applied a configuration change to our migration service as a mitigation and are beginning to see recovery and in increase in successful migration runs. We are continuing to monitor.

Jun 18, 17:48 UTC
Update - We have identified what we believe to be the source of the migration errors and are applying a mitigation, which we expect will begin improving migration success rate.

Jun 18, 17:15 UTC
Update - We are investigating degraded performance for GitHub Enterprise Importer migrations. Some customers may see an increase in failed migrations. Investigation is ongoing.

Jun 18, 17:14 UTC
Investigating - We are currently investigating this issue.

We are investigating reports of degraded performance.

Wed, 05 Jun 2024 19:27:21 +0000

Jun 5, 19:27 UTC
Resolved - On June 5, 2024, between 17:05 UTC and 19:27 UTC, the GitHub Issues service was degraded. During that time, no events related to projects were displayed on issue timelines. These events indicate when an issue was added to or removed from a project and when their status changed within a project. The data couldn’t be loaded due to a misconfiguration of the service backing these events. This happened after a scheduled secret rotation when the wrongly configured service continued using the old secrets which had expired.

We mitigated the incident by remediating the service configuration and have started simplifying the configuration to avoid similar misconfigurations in the future.

Jun 5, 17:22 UTC
Investigating - We are currently investigating this issue.

We are investigating reports of degraded performance.

Mon, 20 May 2024 17:05:48 +0000

May 20, 17:05 UTC
Resolved - Between May 19th 3:40AM UTC and May 20th 5:40PM UTC the service responsible for rendering Jupyter notebooks was degraded. During this time customers were unable to render Jupyter Notebooks.

This occurred due to an issue with a Redis dependency which was mitigated by restarting. An issue with our monitoring led to a delay in our response. We are working to improve the quality and accuracy of our monitors to reduce the time to detection.

May 20, 17:01 UTC
Update - We are beginning to see recovery rendering Jupyter notebooks and are continuing to monitor.

May 20, 16:52 UTC
Investigating - We are currently investigating this issue.

Incident with Actions

Tue, 09 Apr 2024 20:17:07 +0000

Apr 9, 20:17 UTC
Resolved - On April 9, 2024, between 18:00 and 20:17 UTC, Actions was degraded and had failures for new and existing customers. During this time, Actions failed to start for 5,426 new repositories, and 1% of runs for existing customers were delayed, with half of those failing due to an infrastructure error.

The root cause was an expired certificate which caused authentication to fail between internal services. The incident was mitigated once the cert was rotated.

We are working to improve our automation to ensure certs are rotated before expiration.

Apr 9, 20:12 UTC
Update - Actions is operating normally.

Apr 9, 19:43 UTC
Update - We continue to work to resolve issues with repositories not being able to enable Actions and Actions network configuration setup not working properly. We have confirmed a fix and are in the process of deploying it to production. Another update will be shared within the next 30 minutes.

Apr 9, 19:18 UTC
Investigating - We are investigating reports of degraded performance for Actions

We are investigating reports of degraded performance.

Fri, 05 Apr 2024 09:18:11 +0000

Apr 5, 09:18 UTC
Resolved - On April 5, 2024, between 8:11 and 8:58 UTC a number of GitHub services were degraded, returning error responses. Web request error rate peaked at 6%, API request error rate peaked at 10%. Actions had 103,660 workflow runs fail to start.

A database load balancer change caused connection failures in one of our three data centers to various critical database clusters. The incident was mitigated once that change was rolled back.

We have updated our deployment pipeline to better detect this problem in earlier stages of rollout to reduce impact to end users.

Apr 5, 08:54 UTC
Investigating - We are currently investigating this issue.

We are investigating reports of degraded performance.

Fri, 05 Apr 2024 08:53:39 +0000

Apr 5, 08:53 UTC
Resolved - This incident has been resolved.

Apr 5, 08:31 UTC
Investigating - We are currently investigating this issue.

Incident affecting API Requests

Thu, 14 Jul 2022 01:09:39 +0000

Jul 14, 01:09 UTC
Resolved - This incident has been resolved.

Jul 14, 00:51 UTC
Investigating - We are investigating reports of degraded performance for API Requests