Service Health
Incident affecting Vertex AI AutoML Image, Vertex AI Matching Engine, Vertex AI AutoML Tabular, Pub/Sub Lite, Hybrid Connectivity, Cloud Key Management Service, Google Cloud Deploy, Cloud Run, Vertex AI TensorBoard, Cloud Developer Tools, Virtual Private Cloud (VPC), Dialogflow CX, Cloud Workflows, Operations, Cloud Spanner, Vertex AI Explainable AI, Vertex AI Workbench User Managed Notebooks, Google Compute Engine, Cloud Memorystore, Dataproc Metastore, Cloud Logging, Certificate Authority Service, Artifact Registry, Vertex AI Vizier, Persistent Disk, Vertex AI Data Labeling, Google Cloud Dataflow, Data Catalog, Vertex AI Model Registry, Google Cloud Networking, Google Cloud Console, Eventarc, Identity and Access Management, Vertex AI Training, Google Cloud Pub/Sub, Cloud Build, Vertex AI AutoML Video, Vertex AI AutoML Text, Cloud Load Balancing, Vertex AI Pipelines, Vertex AI Feature Store, Vertex AI ML Metadata, Vertex AI Online Prediction, Vertex AI Model Monitoring, Google Cloud Tasks, Vertex AI Batch Prediction, Google Cloud Dataproc, Cloud Machine Learning, Healthcare and Life Sciences, Google Cloud SQL, Google Kubernetes Engine, GKE fleet management, Document AI Warehouse
Multiple Google Cloud Products are experiencing issues in us-west1
Incident began at 2024-02-14 09:45 and ended at 2024-02-14 12:52 (all times are US/Pacific).
Previously affected location(s)
Oregon (us-west1)
Date | Time | Description | |
---|---|---|---|
| 21 Feb 2024 | 13:39 PST | Incident ReportSummaryOn 14 February 2024 from 09:45 AM to 12:52 PM US/Pacific, Google Cloud customers in us-west1 experienced control plane unavailability because of elevated latencies and errors. In addition, a few services experienced data plane unavailability for the same reason. The full list of impacted products and services are detailed below. To our Google Cloud customers whose businesses were impacted during this outage, we sincerely apologize. This is not the level of quality and reliability we strive to offer you. Root CauseMost Google Cloud products and services use a regional metadata store to support their internal operations. The metadata store supports critical functions such as servicing customer requests and handling scale, load balancing, admin operations and for retrieving/storing metadata including server location information. The regional metadata store continuously manages load by automatically adjusting compute capacity in response to changes in demand. When usage increases, additional resources are added and load is also balanced automatically. However, an unexpected spike in demand exceeded the system’s ability to quickly provision additional resources. As a result, multiple Google Cloud products and services in the region experienced elevated latencies and errors until the unexpected load was isolated. Remediation and Prevention/DetectionGoogle engineers were alerted to this problem by our internal monitoring system and throttled the spiking workloads on the underlying regional metadata store. This allowed Google Cloud products and services to read/write state at a normal rate allowing for healthy servicing of customer requests after the backlog of operations on the regional metadata store were processed. Google is committed to preventing a repeat of this issue in the future and is completing the following actions:
Detailed Description of ImpactGoogle Compute Engine
Google Cloud Pub/Sub Lite
App Engine Standard
Cloud Functions
Cloud Run
Dialogflow CX
Vertex AI products:
Google Cloud Pub/Sub
Cloud Memorystore
Eventarc
Dataproc Metastore
Google Cloud Tasks
Cloud Build
Cloud SQL
Speech-to-text
Cloud Load Balancing
Cloud Networking
Cloud Deploy
Workflows
Cloud Logging
Dataform
Certificate Authority Service
VPC and Serverless VPC Access
Cloud Dataflow
Cloud Key Management Service
Persistent Disk
Cloud Data Loss Prevention
Cloud Dataproc
Dataplex Catalog
Cloud Composer
Instances API
Added on 28 Feb 2024 Document AI Warehouse
Google Kubernetes Engine
GKE Fleet Management
To summarize, multiple Google Cloud Products experienced unavailability and/or elevated error rates for services in the us-west1 region during this issue. This is the final version of the Incident Report. |
| 15 Feb 2024 | 09:40 PST | Mini Incident ReportWe apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support. (All Times US/Pacific) Incident Start: 14 February 2024 10:30 Incident End: 14 February 2024 13:10 Duration: 2 hours, 40 minutes Affected Services and Features:
Regions/Zones: us-west1 Description: Customers of multiple Google Cloud products experienced increased latency and error rates in us-west1 for a period of 2 hours, 40 minutes. From preliminary analysis, the root cause of the issue has been narrowed to an internal database resource allocation issue which caused reduced availability and increased latency for many GCP services in the region. Our engineering team mitigated the issue by isolating the problematic traffic and have implemented measures to prevent a recurrence. Google will complete a full Incident Report in the following days that will provide a detailed root cause. Customer Impact: During the time of impact, customers would have experienced high latency and error rates for GCP services in the us-west1 region. |
| 14 Feb 2024 | 13:07 PST | The core issue affecting Google Cloud Products in us-west1 has been mitigated and all the affected products have full service restoration. We understand the disruption this may have caused and sincerely apologize for any inconvenience. The root cause of the issue was identified to be an overloaded common infrastructure component. Our engineering team has mitigated the issue by isolating the traffic and have implemented measures to prevent a recurrence. If you have questions or are still experiencing issues, please open a case with the Support Team and we will work with you until this issue is resolved. We thank you for your patience while we're working on resolving the issue. We will publish a preliminary analysis of this incident once we have completed our internal investigation. |
| 14 Feb 2024 | 12:24 PST | Summary: Multiple Google Cloud Products are experiencing issues in us-west1 Description: We are experiencing an issue with multiple Google Cloud Products beginning on Wednesday, 2024-02-14 9:40 US/Pacific. Our engineers have identified and mitigated the underlying issue. Most of the affected products have recovered and we expect the remaining products to fully recover in the next 1 to 2 hours. The following services have already recovered: Google Kubernetes Engine, Cloud Pub Sub, Virtual Private Cloud, VPC, VPC Serverless Access, Google Compute Engine, Dataplex Catalog, Cloud Interconnect, Cloud Workflows, Cloud Logging, Google Cloud Storage , Eventarc, Cloud SQL, Cloud Key Management Service, Cloud Run, Cloud Dataproc, Cloud Spanner, Diagflow, Cloud Tasks We will provide an update by Wednesday, 2024-02-14 13:00 US/Pacific with current details. Diagnosis: Existing customer load balancers will continue to function. New load balancers or changes to existing load balancers will not propagate configs and changes to the configurations of load balancers may result in an error. Configuration changes can not be made to the Regional Internal, Regional External, and Global External Application Load Balancers in the affected region. Customers may see errors when making configuration changes. Workaround: None at this time. |
| 14 Feb 2024 | 12:10 PST | Summary: Multiple Google Cloud Products are experiencing issues in us-west1 Description: We are experiencing an issue with multiple Google Cloud Products beginning on Wednesday, 2024-02-14 9:40 US/Pacific. Our engineers have identified a common infrastructure component as the root cause and we are attempting a mitigation. As the mitigation progresses, some products may see partial recovery. The following services have recovered: Google Kubernetes Engine, Cloud Pub Sub, Virtual Private Cloud, VPC, VPC Serverless Access, Google Compute Engine, Dataplex Catalog, Cloud Interconnect, Cloud Workflows, Cloud Logging, Google Cloud Storage , Eventarc, Cloud SQL, Cloud Key Management Service, Cloud Run, Cloud Dataproc We do not have an ETA for mitigation at this point. We will provide an update by Wednesday, 2024-02-14 12:45 US/Pacific with current details. Diagnosis: Existing customer load balancers will continue to function. New load balancers or changes to existing load balancers will not propagate configs and changes to the configurations of load balancers may result in an error. Configuration changes can not be made to the Regional Internal, Regional External, and Global External Application Load Balancers in the affected region. Customers may see errors when making configuration changes. Workaround: None at this time. |
| 14 Feb 2024 | 11:47 PST | Summary: Multiple Google Cloud Products are experiencing issues in us-west1 Description: We are experiencing an issue with multiple Google Cloud Products beginning on Wednesday, 2024-02-14 9:40 US/Pacific. Our engineers have identified a common infrastructure component as the root cause and we are attempting a mitigation. As the mitigation progresses, some products may see partial recovery. We do not have an ETA for mitigation at this point. We will provide an update by Wednesday, 2024-02-14 12:20 US/Pacific with current details. Diagnosis: Existing customer load balancers will continue to function. New load balancers or changes to existing load balancers will not propagate configs and changes to the configurations of load balancers may result in an error. Configuration changes can not be made to the Regional Internal, Regional External, and Global External Application Load Balancers in the affected region. Customers may see errors when making configuration changes. Workaround: None at this time. |
| 14 Feb 2024 | 11:29 PST | Summary: Multiple Google Cloud Products are experiencing issues in us-west1 Description: We are experiencing an issue with multiple Google Cloud Products beginning on Wednesday, 2024-02-14 9:40 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2024-02-14 12:30 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Existing customer load balancers will continue to function. New load balancers or changes to existing load balancers will not propagate configs and changes to the configurations of load balancers may result in an error. Configuration changes can not be made to the Regional Internal, Regional External, and Global External Application Load Balancers in the affected region. Customers may see errors when making configuration changes. Workaround: None at this time. |
| 14 Feb 2024 | 11:05 PST | Summary: We are experiencing an issue with Cloud Load Balancing. Description: We are experiencing an issue with Cloud Load Balancing. Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2024-02-14 12:44 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: None at this time. Workaround: None at this time. |
- All times are US/Pacific