Amazon Redshift FAQs

NA"},"metadata":{"tags":[{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}}]}},{"fields":{"topic":"What are the benefits of using Amazon Redshift in SageMaker for SQL analytics?","id":"product-faqs#amazon-sagemaker-sql-analytics-faq-1","customSortOrder":"1","content":"

SageMaker simplifies SQL analytics by providing a comprehensive, user-friendly platform that connects multiple data sources and streamlines data exploration. With a flexible notebook-style interface, you can access data from Amazon S3, Amazon Redshift, and other data sources, write and run queries across different engines, and directly create visualizations within the tool. The platform automatically manages your data's metadata, making it easier to understand and discover information. By integrating seamlessly with other AWS services, the platform allows you to go beyond traditional SQL analysis, turning your data into actionable insights with minimal technical complexity."},"metadata":{"tags":[{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}},{"id":"product-faqs#redshift-faqs#amazon-sagemaker-sql-analytics","name":"Amazon SageMaker SQL analytics","namespaceId":"product-faqs#redshift-faqs","description":"

Amazon SageMaker SQL analytics","metadata":{}}]}},{"fields":{"topic":"What is zero-ETL?","id":"product-faqs#zero-etl-integration-faq-1","customSortOrder":"1","content":"

Zero-ETL is a set of fully managed integrations by AWS that removes or minimizes the need to build extract, transform, and load (ETL) data pipelines. Zero-ETL makes data available in SageMaker Lakehouse and Amazon Redshift from multiple operational sources, transactional sources, and enterprise applications. ETL is the process of combining, cleaning, and normalizing data from different sources to get it ready for analytics, AI, and ML workloads. Traditional ETL processes are time-consuming and complex to develop, maintain, and scale. Instead, zero-ETL integrations facilitate point-to-point data movement without the need to create and operate ETL data pipelines. \n

Visit What is zero-ETL? to learn more."},"metadata":{"tags":[{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}},{"id":"product-faqs#redshift-faqs#zero-etl-integrations","name":"Zero-ETL Integrations","namespaceId":"product-faqs#redshift-faqs","description":"

Zero-ETL Integrations","metadata":{}}]}},{"fields":{"topic":"What services support zero-ETL integrations with Amazon Redshift?","id":"product-faqs#what-services-support-zero-etl-integrations-with-amazon-redshift","customSortOrder":"1","content":"

Amazon Aurora MySQL-Compatible Edition, Amazon Aurora PostgreSQL-Compatible Edition (Preview), Amazon RDS for MySQL, and Amazon DynamoDB (Limited Preview) support zero-ETL integrations with Amazon Redshift."},"metadata":{"tags":[{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}}]}},{"fields":{"topic":"How do I load data into my Amazon Redshift data warehouse?","id":"product-faqs#amazon-load-data-serverless","customSortOrder":"1","content":"

<p>You can load data into Amazon Redshift from a range of data sources including <a href=\"https://aws.amazon.com/s3/\">Amazon S3</a>, <a href=\"https://aws.amazon.com/rds/\">Amazon RDS</a>, <a href=\"https://aws.amazon.com/dynamodb/\">Amazon DynamoDB</a>, <a href=\"https://aws.amazon.com/emr/\">Amazon EMR</a>, <a href=\"https://aws.amazon.com/glue/\">AWS Glue</a>, <a href=\"https://aws.amazon.com/datapipeline/\">AWS Data Pipeline</a> and or any SSH-enabled host on Amazon EC2 or on-premises. Amazon Redshift attempts to load your data in parallel into each compute node to maximize the rate at which you can ingest data into your data warehouse cluster. Clients can connect to Amazon Redshift using ODBC or JDBC and issue 'insert' SQL commands to insert the data. Please note this is slower than using S3 or DynamoDB since those methods load data in parallel to each compute node while SQL insert statements load through the single leader node. For more details on loading data into Amazon Redshift, please view our <a href=\"http://docs.aws.amazon.com/redshift/latest/gsg/welcome.html\">Getting Started Guide</a>.</p>"},"metadata":{"tags":[{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}},{"id":"product-faqs#redshift-faqs#data-ingestion-and-loading","name":"Data ingestion and loading","namespaceId":"product-faqs#redshift-faqs","description":"Data ingestion and loading","metadata":{}}]}},{"fields":{"topic":"How do I monitor the performance of my Amazon Redshift data warehouse cluster?","id":"product-faqs#amazon-monitor-cluster-redshift","customSortOrder":"1","content":"

<p>Metrics for compute utilization, storage utilization, and read/write traffic to your Amazon Redshift data warehouse cluster are available free of charge through the <a href=\"https://aws.amazon.com/console/\">AWS Management Console</a> or <a href=\"https://aws.amazon.com/cloudwatch/\">Amazon CloudWatch</a> APIs. You can also add additional, user-defined metrics through Amazon CloudWatch’s custom metric functionality. The AWS Management Console provides a monitoring dashboard that helps you monitor the health and performance of all your clusters. Amazon Redshift also provides information on query and cluster performance through the AWS Management Console. This information enables you to see which users and queries are consuming the most system resources to diagnose performance issues by viewing query plans and execution statistics. In addition, you can see the resource utilization on each of your compute nodes to ensure that you have data and queries that are well-balanced across all nodes.</p>"},"metadata":{"tags":[{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}},{"id":"product-faqs#redshift-faqs#monitoring","name":"monitoring","namespaceId":"product-faqs#redshift-faqs","description":"

monitoring","metadata":{}}]}},{"fields":{"topic":"How does Amazon Redshift backup my data? How do I restore my cluster from a backup?","id":"product-faqs#amazon-backup-redshift","customSortOrder":"1","content":"

<p>Amazon Redshift RA3 clusters and Amazon Redshift Serverless use Redshift Managed Storage, which always has the latest copy of the data available. DS2 and DC2 clusters mirror the data on the cluster to ensure the latest copy is available in the event of a failure. Backups are automatically created on all Redshift cluster types and retained for 24 hours, and on serverless recovery points are provided for the past 24 hours.<br><br>You can also create your own backups that can be retained indefinitely. These backups can be created at any time, and the Amazon Redshift automated backups or Amazon Redshift Serverless recovery points can be converted into a user backup for longer retention.<br><br>Amazon Redshift can also asynchronously replicate your snapshots or recovery points to Amazon S3 in another Region for disaster recovery.<br><br>On a DS2 or DC2 cluster, free backup storage is limited to the total size of storage on the nodes in the data warehouse cluster and only applies to active data warehouse clusters.<br><br>For example, if you have total data warehouse storage of 8 TB, we will provide at most 8 TB of backup storage at no additional charge. If you would like to extend your backup retention period beyond one day, you can do so using the <a href=\"https://aws.amazon.com/console/\">AWS Management Console</a> or the <a href=\"http://docs.aws.amazon.com/redshift/latest/APIReference/Welcome.html\">Amazon Redshift APIs</a>. For more information on automated snapshots, please refer to the <a href=\"https://docs.aws.amazon.com/redshift/latest/mgmt/overview.html\">Amazon Redshift Management Guide</a>.<br><br>Amazon Redshift only backs up data that has changed, so most snapshots use only a small amount of your free backup storage. When you need to restore a backup, you have access to all the automated backups within your backup retention window. Once you choose a backup from which to restore, we will provision a new data warehouse cluster and restore your data to it.</p>"},"metadata":{"tags":[{"id":"product-faqs#redshift-faqs#backup","name":"backup","namespaceId":"product-faqs#redshift-faqs","description":"

backup","metadata":{}},{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}}]}},{"fields":{"topic":"Are Amazon Redshift and Redshift Spectrum compatible with my preferred business intelligence software package and ETL tools?","id":"product-faqs#amazon-spectrum-availability-redshift","customSortOrder":"1","content":"

<p>Yes, Amazon Redshift uses industry-standard SQL and is accessed using standard JDBC and ODBC drivers. You can download Amazon Redshift custom JDBC and ODBC drivers from the Connect Client tab of the <a href=\"https://console.aws.amazon.com/redshift/\">Redshift Console</a>. We have validated integrations with popular <a href=\"https://aws.amazon.com/redshift/partners/\">BI and ETL vendors</a>, a number of which are offering <a href=\"https://aws.amazon.com/redshift/free-trial/\">free trials</a> to help you get started loading and analyzing your data. You can also go to the <a href=\"https://aws.amazon.com/partners/aws-marketplace/\">AWS Marketplace</a> to deploy and configure solutions designed to work with Amazon Redshift in minutes.<br>Amazon Redshift Spectrum supports all Amazon Redshift client tools. The client tools can continue to connect to the Amazon Redshift cluster endpoint using ODBC or JDBC connections. No changes are required.<br>You use exactly the same query syntax and have the same query capabilities to access tables in Redshift Spectrum as you have for tables in the local storage of your Redshift cluster. External tables are referenced using the schema name defined in the CREATE EXTERNAL SCHEMA command where they were registered.</p>"},"metadata":{"tags":[{"id":"product-faqs#redshift-faqs#querying","name":"querying","namespaceId":"product-faqs#redshift-faqs","description":"

querying","metadata":{}},{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}}]}},{"fields":{"topic":"What happens to my data warehouse cluster availability and data durability in the event of individual node failure?","id":"product-faqs#amazon-cluster-availability-redshift","customSortOrder":"1","content":"

<p>Amazon Redshift will automatically detect and replace a failed node in your data warehouse cluster. On Dense Compute (DC) and Dense Storage (DS2) clusters, the data is stored on the compute nodes to ensure high data durability. When a node is replaced, the data is refreshed from the mirror copy on the other node. RA3 clusters and Redshift serverless are not impacted the same way since the data is stored in Amazon S3 and the local drive is just used as a data cache. The data warehouse cluster will be unavailable for queries and updates until a replacement node is provisioned and added to the DB. Amazon Redshift makes your replacement node available immediately and loads your most frequently accessed data from Amazon S3 first to allow you to resume querying your data as quickly as possible. Single node clusters do not support data replication. In the event of a drive failure, you must restore the cluster from snapshot on S3. We recommend using at least two nodes for production.<br>&nbsp;</p>"},"metadata":{"tags":[{"id":"product-faqs#redshift-faqs#availability","name":"availability","namespaceId":"product-faqs#redshift-faqs","description":"

availability","metadata":{}},{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}}]}},{"fields":{"topic":"How does Amazon Redshift keep my data secure?","id":"product-faqs#amazon-security-redshift","customSortOrder":"1","content":"

<p>Amazon Redshift supports industry-leading security with built-in identity management and federation for single sign-on (SSO), multi-factor authentication, column-level access control, row-level security, role-based access control, and Amazon Virtual Private Cloud (Amazon VPC). With Amazon Redshift, your data is encrypted in transit and at rest. All Amazon Redshift security features are offered out-of-the-box at no additional cost to satisfy the most demanding security, privacy, and compliance requirements. You get the benefit of AWS supporting more security standards and compliance certifications than any other provider, including ISO 27001, SOC, HIPAA/HITECH, and FedRAMP.</p>"},"metadata":{"tags":[{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}},{"id":"product-faqs#redshift-faqs#security","name":"Security","namespaceId":"product-faqs#redshift-faqs","description":"Security","metadata":{}}]}},{"fields":{"topic":" How do I scale the size and performance of my Amazon Redshift data warehouse cluster?","id":"product-faqs#amazon-scale-size-redshift","customSortOrder":"1","content":"

<p>Amazon Redshift Serverless automatically provisions data warehouse capacity and intelligently scales the underlying resources. Amazon Redshift Serverless adjusts capacity in seconds to deliver consistently high performance and simplified operations for even the most demanding and volatile workloads. With the Concurrency Scaling feature, you can support unlimited concurrent users and concurrent queries, with consistently fast query performance. When concurrency scaling is enabled, Amazon Redshift automatically adds cluster capacity when your cluster experiences increase in query queueing.<br><br>For manual scaling, If you would like to increase query performance or respond to CPU, memory, or I/O overutilization, you can increase the number of nodes within your data warehouse cluster using Elastic Resize through the <a href=\"https://aws.amazon.com/console/\">AWS Management Console</a> or the <a href=\"https://docs.aws.amazon.com/redshift/latest/APIReference/API_ModifyCluster.html\">ModifyCluster</a> API. When you modify your data warehouse cluster, your requested changes will be applied immediately. Metrics for compute utilization, storage utilization, and read/write traffic to your Redshift data warehouse cluster are available free of charge through the AWS Management Console or Amazon CloudWatch APIs. You can also add user-defined metrics through <a href=\"https://aws.amazon.com/cloudwatch/\">Amazon CloudWatch</a> custom metric functionality.<br><br>With Amazon Redshift Spectrum, you can run multiple Redshift clusters accessing the same data in Amazon S3. You can use different clusters for different use cases. For example, you can use one cluster for standard reporting and another for data science queries. Your marketing team can use their own clusters different from your operations team. Redshift Spectrum automatically distributes the execution of your query to several Redshift Spectrum workers out of a shared resource pool to read and process data from Amazon S3, and pulls results back into your Redshift cluster for any remaining processing.</p>"},"metadata":{"tags":[{"id":"product-faqs#redshift-faqs#scalability-and-concurrency","name":"Scalability and concurrency","namespaceId":"product-faqs#redshift-faqs","description":"Scalability and concurrency","metadata":{}},{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}}]}},{"fields":{"topic":"What are the use cases for data sharing?","id":"product-faqs#amazon-data-sharing-redshift","customSortOrder":"1","content":"

<p>Key use cases include:</p><ul><li>A central ETL cluster sharing data with many BI/analytics clusters to provide read workload isolation and optional charge-ability.<br>&nbsp;</li><li>A data provider sharing data to external consumers.<br>&nbsp;</li><li>Sharing common datasets such as customers, products across different business groups and collaborating for broad analytics and data science.<br>&nbsp;</li><li>Decentralizing a data warehouse to simplify management.<br>&nbsp;</li><li>Sharing data between development, test, and production environments.<br>&nbsp;</li><li>Accessing Redshift data from other AWS analytic services.</li></ul>"},"metadata":{"tags":[{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}},{"id":"product-faqs#redshift-faqs#data-sharing","name":"Data sharing","namespaceId":"product-faqs#redshift-faqs","description":"Data sharing","metadata":{}}]}},{"fields":{"topic":"What is Amazon Redshift Serverless?","id":"product-faqs#amazon-serverless-redshift","customSortOrder":"1","content":"

<p>Amazon Redshift Serverless is a serverless option of Amazon Redshift that makes it more efficient to run and scale analytics in seconds without the need to set up and manage data warehouse infrastructure. With Redshift Serverless, any user—including data analysts, developers, business professionals, and data scientists—can get insights from data by simply loading and querying data in the data warehouse.</p>"},"metadata":{"tags":[{"id":"product-faqs#redshift-faqs#serverless","name":"Serverless","namespaceId":"product-faqs#redshift-faqs","description":"Serverless","metadata":{}},{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}}]}},{"fields":{"topic":"What is Amazon Redshift?","id":"product-faqs#amazon-general-redshift","customSortOrder":"1","content":"

<p>Tens of thousands of customers use Amazon Redshift every day to run SQL analytics in the cloud, processing exabytes of data for business insights. Whether your growing data is stored in operational data stores, data lakes, streaming data services or third-party datasets, Amazon Redshift helps you securely access, combine, and share data with minimal movement or copying. Amazon Redshift is deeply integrated with AWS database, analytics, and machine learning services to employ Zero-ETL approaches or help you access data in place for near real-time analytics, build machine learning models in SQL, and enable Apache Spark analytics using data in Redshift. Amazon Redshift Serverless enables your engineers, developers, data scientists, and analysts to get started easily and scale analytics quickly in a zero-administration environment. With its Massively Parallel Processing (MPP) engine and architecture that separates compute and storage for efficient scaling, and machine learning driven performance innovations (for example: AutoMaterialized Views), Amazon Redshift is built for scale and delivers up to 5x better price performance than other cloud data warehouses.</p>"},"metadata":{"tags":[{"id":"product-faqs#redshift-faqs#general","name":"General","namespaceId":"product-faqs#redshift-faqs","description":"General","metadata":{}},{"id":"GLOBAL#product#redshift","name":"Amazon Redshift","namespaceId":"GLOBAL#product","description":"Amazon Redshift","metadata":{}}]}},{"fields":{"topic":"What ETL challenges does zero-ETL integration solve?","id":"product-faqs#zero-etl-integration-faq-2","customSortOrder":"2","content":"

The zero-ETL integrations solve many of the existing data movement challenges in traditional ETL processes, including: \n