Amazon Athena FAQs

You can invoke your SageMaker AI models in an Athena SQL query to run inference. The ability to use ML models in SQL queries makes complex tasks such anomaly detection, customer cohort analysis and sales predictions as simple as writing a SQL query. Athena makes it simple for anyone with SQL experience to run ML models deployed on SageMaker AI."},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#machine-learning","name":"Machine learning","namespaceId":"product-faqs#athena-faqs","description":"

Machine learning","metadata":{}}]}},{"fields":{"topic":"How do I control access to my data?","id":"product-faqs#how-do-i-control-access-to-my-data","customSortOrder":"1","content":"

Amazon Athena supports fine-grained access control with Amazon SageMaker Lakehouse. Amazon SageMaker Lakehouse allows for centrally managing permissions and access control for data catalog resources. You can enforce fine-grained access control policies in Athena queries for data stored in any supported file format using table formats such as Apache Iceberg, Apache Hudi, Apache Hive, and federated data sources that are registered with Amazon SageMaker Lakehouse. With Athena, you get the flexibility to choose the table and file format best suited for your use case and get the benefit of centralized data governance to secure data access. For example, you can use Iceberg table format to store data in your S3 data lake for reliable write transactions at scale together with row-level security filters in Lake Formation so that data analysts residing in different countries get access to data only for customers located in their own country to meet the regulatory requirements. Regardless of table format or federated query data source type, you can use the same feature set in Amazon SageMaker Lakehouse to govern your data, simplfying how users understand few governance concepts and apply everywhere. Athena also allows you to control access to your data by using AWS Identity and Access Management (IAM) policies, access control lists (ACLs), and S3 bucket policies. With IAM policies, you can grant IAM users fine-grained control to your S3 buckets. By controlling access to data on S3, you can restrict users from querying it using Athena."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#security-and-availability","name":"Security and availability","namespaceId":"product-faqs#athena-faqs","description":"

Security and availability","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"What is Amazon Athena?","id":"product-faqs#what-is-athena","customSortOrder":"1","content":"

Athena is an interactive analytics service that makes it simple to analyze data in Amazon Simple Storage Service (S3) using SQL. Athena is serverless, so there is no infrastructure to set up or manage, and you can start analyzing data immediately. You don’t even need to load your data into Athena; it works directly with data stored in Amazon S3. Amazon Athena for SQL uses Trino and Presto with full standard SQL support and works with various standard data formats, including CSV, JSON, Apache ORC, Apache Parquet, and Apache Avro. Athena for Apache Spark supports SQL and allows you to use Apache Spark, an open-source, distributed processing system used for big data workloads. To get started, log in to the Athena Management Console and start interacting with your data using the query editor or notebooks."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#general","name":"General","namespaceId":"product-faqs#athena-faqs","description":"

General","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"What is the difference between Athena, Amazon EMR, and Amazon Redshift?","id":"product-faqs#difference-between-athena-emr-redshift","customSortOrder":"1","content":"

Query services like Athena, data warehouses like Amazon Redshift, and sophisticated data processing frameworks like Amazon EMR all address different needs and use cases. You just need to choose the right tool for the job. Amazon Redshift provides the fastest query performance for enterprise reporting and business intelligence workloads, particularly those involving complex SQL with multiple joins and subqueries. Amazon EMR simplifies the process and makes it and cost effective to run highly distributed processing frameworks, such as Apache Hadoop, Spark, and Presto when compared to on-premises deployments. Amazon EMR is flexible—you can run custom applications and code and define specific compute, memory, storage, and application parameters to enhance your analytic requirements. Athena provides a simplified way to run interactive queries for data in S3 without the need to set up or manage any servers."},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#when-to-use-athena-vs-other-big-data-services","name":"When to use Athena vs other big data services","namespaceId":"product-faqs#athena-faqs","description":"

When to use Athena vs other big data services","metadata":{}}]}},{"fields":{"topic":"What is Amazon Athena for Apache Spark?","id":"product-faqs#what-is-athena-for-apache-spark","customSortOrder":"1","content":"

Athena supports Apache Spark framework to enable data analysts and data engineers with the interactive, fully managed experience of Athena. Apache Spark is a popular open-source, distributed processing system that is enhanced for fast analytics workloads against data of any size that offers a rich system of open-source libraries. You can now build Spark applications in expressive languages, such as Python, using a simplified notebook experience in the Athena console or through Athena APIs. You can query data from various sources, chain together multiple calculations, and visualize the results of their analyses. For interactive Spark applications, you spend less time waiting and are more productive as Athena starts running applications under a second. Customers get a simplified and purpose-built Spark experience that minimizes work required for version upgrades, performance tuning, and integration with other AWS services."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#athena-for-apache-spark","name":"Athena for Apache Spark","namespaceId":"product-faqs#athena-faqs","description":"

Amazon Athena for Apache Spark","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"How do I control access to my data?","id":"product-faqs#how-do-i-control-access-to-data","customSortOrder":"1","content":"

Amazon Athena supports fine-grained access control with AWS Lake Formation. AWS Lake Formation allows for centrally managing permissions and access control for data catalog resources in your S3 data lake. You can enforce fine-grained access control policies in Athena queries for data stored in any supported file format using table formats such as Apache Iceberg, Apache Hudi, and Apache Hive. You get the flexibility to choose the table and file format best suited for your use case and get the benefit of centralized data governance to secure data access when using Athena. For example, you can use Iceberg table format to store data in your S3 data lake for reliable write transactions at scale together with row-level security filters in Lake Formation so that data analysts residing in different countries get access to data only for customers located in their own country to meet the regulatory requirements. The new expanded support for table and file formats does not require any change in how you set up fine-grained access control policies in Lake Formation and requires Athena engine version 3 which offers new features and improved query performance. Athena also allows you to control access to your data by using AWS Identity and Access Management (IAM) policies, access control lists (ACLs), and S3 bucket policies. With IAM policies, you can grant IAM users fine-grained control to your S3 buckets. By controlling access to data on S3, you can restrict users from querying it using Athena."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#security-and-availability","name":"Security and availability","namespaceId":"product-faqs#athena-faqs","description":"

Security and availability","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Which use cases does Athena support for embedded ML?","id":"product-faqs#use-cases-enabled-by-athena-for-ml","customSortOrder":"1","content":"

Athena use cases for ML span different industries, as in the following examples. Financial risk data analysts can run what-if analysis and Monte Carlo simulations. Business analysts might run linear regression or forecasting models to predict future values to help them create richer and forward-looking business dashboards that forecast revenues. Marketing analysts can use k-means clustering models to help determine their different customer segments. Security analysts can use logistic regression models to find anomalies and detect security incidents from logs."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#machine-learning","name":"Machine learning","namespaceId":"product-faqs#athena-faqs","description":"

Machine learning","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"What is a federated query?","id":"product-faqs#what-is-a-federated-query","customSortOrder":"1","content":"

If you have data in sources other than S3, you can use Athena to query the data in place or build pipelines that extract data from multiple data sources and store them on S3. With Athena Federated Query, you can run SQL queries across data stored in relational, nonrelational, object, and custom data sources."},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#federated-query","name":"Federated query","namespaceId":"product-faqs#athena-faqs","description":"

Federated query","metadata":{}}]}},{"fields":{"topic":"Which use cases does Athena support for embedded ML?","id":"product-faqs#which-use-cases-does-athena-support-for-embedded-ml","customSortOrder":"1","content":"

Athena use cases for ML span different industries, as in the following examples. Financial risk data analysts can run what-if analysis and Monte Carlo simulations. Business analysts might run linear regression or forecasting models to predict future values to help them create richer and forward-looking business dashboards that forecast revenues. Marketing analysts can use k-means clustering models to help determine their different customer segments. Security analysts can use logistic regression models to find anomalies and detect security incidents from logs."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#machine-learning","name":"Machine learning","namespaceId":"product-faqs#athena-faqs","description":"

Machine learning","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"How is Athena priced?","id":"product-faqs#how-is-athena-priced","customSortOrder":"1","content":"

With Athena, you can choose to pay per query based on data scanned or based on the compute needed by your queries. Per query is pricing per query is based on the amount of data scanned, in terabytes (TB), by the query. You can store data in various formats on S3. If you compress your data, partition, or convert it to a columnar storage formats, you’ll pay less because your queries scan less data. Converting data to a columnar format allows Athena to read only the columns that it must process the query. With Provisioned Capacity, you pay an hourly price for query processing capacity, not data scanned. You can use per query billing and compute-based billing within the same account. For more details, review the Amazon Athena pricing page."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#pricing-and-billing","name":"Pricing and billing","namespaceId":"product-faqs#athena-faqs","description":"

Pricing and billing","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"What is Amazon Athena for Apache Spark?","id":"product-faqs#what-is-amazon-athena-for-apache-spark","customSortOrder":"1","content":"

Athena supports Apache Spark framework to enable data analysts and data engineers with the interactive, fully managed experience of Athena. Apache Spark is a popular open-source, distributed processing system that is enhanced for fast analytics workloads against data of any size that offers a rich system of open-source libraries. You can now build Spark applications in expressive languages, such as Python, using a simplified notebook experience in the Athena console or through Athena APIs. You can query data from various sources, chain together multiple calculations, and visualize the results of their analyses. For interactive Spark applications, you spend less time waiting and are more productive as Athena starts running applications under a second. Customers get a simplified and purpose-built Spark experience that minimizes work required for version upgrades, performance tuning, and integration with other AWS services."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#athena-for-apache-spark","name":"Athena for Apache Spark","namespaceId":"product-faqs#athena-faqs","description":"

Amazon Athena for Apache Spark","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"What is the difference between Athena, Amazon EMR, and Amazon Redshift?","id":"product-faqs#what-is-the-difference-between-athena-amazon-emr-and-amazon-redshift","customSortOrder":"1","content":"

Query services like Athena, data warehouses like Amazon Redshift, and sophisticated data processing frameworks like Amazon EMR all address different needs and use cases. You just need to choose the right tool for the job. Amazon Redshift provides the fastest query performance for enterprise reporting and business intelligence workloads, particularly those involving complex SQL with multiple joins and subqueries. Amazon EMR simplifies the process and makes it and cost effective to run highly distributed processing frameworks, such as Apache Hadoop, Spark, and Presto when compared to on-premises deployments. Amazon EMR is flexible—you can run custom applications and code and define specific compute, memory, storage, and application parameters to enhance your analytic requirements. Athena provides a simplified way to run interactive queries for data in S3 without the need to set up or manage any servers."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#when-to-use-athena-vs-other-big-data-services","name":"When to use Athena vs other big data services","namespaceId":"product-faqs#athena-faqs","description":"

When to use Athena vs other big data services","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"How do I create tables and schemas for my data on S3?","id":"product-faqs#how-do-i-create-tables-schema-on-s3","customSortOrder":"1","content":"

Athena uses Apache Hive DDL to define tables. You can run DDL statements using the Athena console, with an ODBC or JDBC driver, through the API, or using the Athena create table wizard. If you use the Data Catalog with Athena, you can also use AWS Glue crawlers to automatically infer schemas and partitions. An AWS Glue crawler connects to a data store, progresses through a prioritized list of classifiers to extract the schema of your data and other statistics, and then populates the Data Catalog with this metadata. Crawlers can run periodically to detect the availability of new data and changes to existing data, including table definition changes. Crawlers automatically add new tables, new partitions to existing table, and new versions of table definitions. You can customize AWS Glue crawlers to classify your own file types.  \n

When you create a new table schema in Athena, the schema is stored in the Data Catalog and used when running queries, but it does not modify your data in S3. Athena uses an approach known as schema-on-read, which allows you to project your schema onto your data when you run a query. This decreases the need for any data loading or transformation. Learn more about creating tables. "},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#creating-tables-date-formats-and-partitions","name":"Creating tables, data formats and partitions","namespaceId":"product-faqs#athena-faqs","description":"

Creating tables, data formats, and partitions","metadata":{}}]}},{"fields":{"topic":"Which kinds of queries does Athena support?","id":"product-faqs#which-queries-does-athena-support","customSortOrder":"1","content":"

Athena supports ANSI SQL queries. Athena uses Trino, an open-source, in-memory, distributed SQL engine, and can handle complex analysis, including large joins, window functions, and arrays."},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#querying-data-formats-and-multiclouds","name":"Querying, data formats, and multicloud","namespaceId":"product-faqs#athena-faqs","description":"

Querying, data formats, and multicloud","metadata":{}}]}},{"fields":{"topic":"How do you access Athena?","id":"product-faqs#how-do-you-access-athena","customSortOrder":"1","content":"

Amazon Athena for SQL can be accessed through the AWS Management Console, AWS SDK and CLI, or Athena's ODBC or JDBC driver. You can programmatically run queries, add tables, or partitions using the ODBC or JDBC driver."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#athena-for-sql","name":"Amazon Athena for SQL","namespaceId":"product-faqs#athena-faqs","description":"

Tag to identify Amazon Athena for SQL","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"What can I do with Athena?","id":"product-faqs#what-can-i-do-with-athena","customSortOrder":"2","content":"

With Athena, you can analyze data stored in S3 and 30 different data sources, including on-premises data sources or other cloud systems. You can use Athena to run interactive analytics using ANSI SQL or Python without the need to aggregate or load the data into Athena. Athena can process unstructured, semi-structured, and structured datasets. Examples include CSV, JSON, Avro, or columnar data formats such as Parquet and ORC. Amazon Athena for SQL integrates with Amazon QuickSight for visualizing your data or creating dashboards. You can also use Athena to generate reports or explore data with business intelligence tools or SQL clients, connected with an ODBC or JDBC driver. "},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#general","name":"General","namespaceId":"product-faqs#athena-faqs","description":"

General","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"What is the underlying technology behind Athena for SQL?","id":"product-faqs#what-is-underlying-technology","customSortOrder":"2","content":"

Athena for SQL uses Trino with full standard SQL support and works with various standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Athena can handle complex analysis, including large joins, window functions, and arrays. With Amazon Athena SQL engine version 3 built on Trino, we continue to increase performance and provide new features, similar to our approach on Amazon Athena engine version 2 built on Presto. One of the most exciting aspects of v3 is its new continuous integration approach to open source software management that will keep customers up to date with Trino and PrestoDB projects. We aim to stay within 60-90 days of open-source Trino launches. The Athena development team is actively contributing bug fixes and security, scalability, performance, and feature enhancements back to these open-source code bases, so anyone using Trino, Presto, and Apache Iceberg can benefit from the team’s contributions."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#athena-for-sql","name":"Amazon Athena for SQL","namespaceId":"product-faqs#athena-faqs","description":"

Tag to identify Amazon Athena for SQL","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"How does the Athena SQL support compare to Redshift, and how do I choose between the two services?","id":"product-faqs#how-does-athena-sql-support-compare-to-redshift","customSortOrder":"2","content":"

Amazon Athena and Amazon Redshift Serverless address different needs and use cases even if both services are serverless and enable SQL users. \n

With its Massively Parallel Processing (MPP) architecture that separates storage and compute and machine learning–led automatic optimization capabilities, a data warehouse such as Amazon Redshift, whether its serverless or provisioned, is a great choice for customers that need the best price performance at any scale for complex BI and analytics workloads. Redshift is best suited for driving scaled analytics and massive, structured, and semi-structured datasets. It performs well for enterprise reporting and business intelligence workloads, particularly those involving extremely complex SQL with multiple joins and subqueries. Redshift offers deep integration with AWS database, analytics, and ML services so customers access data in place or ingest or move data easily into the warehouse for high performance analytics, through minimal ETL and no-code methods. With federated query capabilities, Amazon Redshift Spectrum, integration with Amazon Aurora, AWS Data Exchange, streaming data services, and others, Redshift lets you use data from multiple sources, combine with the data in the warehouse, and conduct analytics and machine learning on top of it. Redshift offers both provisioned and serverless options to get started with analytics easily without managing infrastructure. \n

Athena is well suited for interactive analytics and data exploration of data in Amazon Simple Storage Service (S3) or any data source through an extensible connector framework (includes over 30 out-of-box connectors for applications and on-premises or other cloud analytics systems) with an easy-to-use SQL syntax. Amazon Athena is built on open-source engines and frameworks such as Spark, Presto, and Apache Iceberg, giving customers the flexibility to use Python or SQL or work on open-data formats. If customers want to do interactive analytics using open-source frameworks and data formats, Amazon Athena is a great place to start. It is completely serverless, meaning there’s no infrastructure to manage or set up. The openness of Athena increases the data portability, allowing our customer to move data among different application, programs, and even cloud service providers. It has recently adopted a new continuous integration approach to open-source software management that will constantly integrate the latest features from the Trino, PrestoDB, and Apache Iceberg projects."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#when-to-use-athena-vs-other-big-data-services","name":"When to use Athena vs other big data services","namespaceId":"product-faqs#athena-faqs","description":"

When to use Athena vs other big data services","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Which data formats does Athena support?","id":"product-faqs#data-formats-supported-by-athena","customSortOrder":"2","content":"

Athena supports various data formats like CSV, TSV, JSON, or Textfiles and also supports open-source columnar formats, such as ORC and Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats. You can improve performance and reduce your costs by compressing, partitioning, and using columnar formats. "},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#creating-tables-date-formats-and-partitions","name":"Creating tables, data formats and partitions","namespaceId":"product-faqs#athena-faqs","description":"

Creating tables, data formats, and partitions","metadata":{}}]}},{"fields":{"topic":"Can I use QuickSight with Athena?","id":"product-faqs#quicksight-with-athena","customSortOrder":"2","content":"

Yes. Athena integrates with QuickSight, so you can seamlessly visualize your data stored in S3. "},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#querying-data-formats-and-multiclouds","name":"Querying, data formats, and multicloud","namespaceId":"product-faqs#athena-faqs","description":"

Querying, data formats, and multicloud","metadata":{}}]}},{"fields":{"topic":"Why should I use federated queries in Athena?","id":"product-faqs#when-to-use-federated-queries","customSortOrder":"2","content":"

Organizations often store data in a data source that meets the needs of their applications or business processes. These can include relational, key-value, document, in-memory, search, graph, time-series, and ledger databases in addition to storing data in an S3 data lake. Performing analytics on such diverse sources can be complex and time consuming because it typically requires learning new programming languages or database constructs and building complex pipelines to extract, transform, and duplicate data before it can be used for analysis. Athena reduces this complexity by allowing you to run SQL queries on the data where it is. You can use well-known SQL constructs to query data across multiple data sources for quick analysis, or use scheduled SQL queries to extract and transform data from multiple data sources and store them on S3 for further analysis."},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#federated-query","name":"Federated query","namespaceId":"product-faqs#athena-faqs","description":"

Federated query","metadata":{}}]}},{"fields":{"topic":"Which ML models can be used with Athena?","id":"product-faqs#ml-models-used-in-athena","customSortOrder":"2","content":"

Athena can invoke any ML model that is deployed on SageMaker. You have the flexibility to train your own model using your proprietary data, or use a model that is pretrained and deployed on SageMaker. For example, cluster analysis would likely be trained on your own data because you want to categorize new records into the same categories that you used for previous records. Alternatively, for predicting real-world sports events, you could use a publicly available model because the training data used would be in the public domain already. Domain-specific or industry-specific predictions will typically be trained on your own data in SageMaker, while undifferentiated ML needs might use external models."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#machine-learning","name":"Machine learning","namespaceId":"product-faqs#athena-faqs","description":"

Machine learning","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Can Athena query encrypted data in S3?","id":"product-faqs#can-athena-query-encrypted-data-in-s3","customSortOrder":"2","content":"

Yes, you can query data that’s encrypted using server-side encryption (SSE) with S3-managed encryption keys, SSE with AWS Key Management Service (KMS)–managed keys, and client-side encryption (CSE) with keys managed by AWS KMS. Athena also integrates with AWS KMS and provides you with an option to encrypt your result sets."},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#security-and-availability","name":"Security and availability","namespaceId":"product-faqs#athena-faqs","description":"

Security and availability","metadata":{}}]}},{"fields":{"topic":"Why do I get charged less when I use a columnar format?","id":"product-faqs#why-do-i-get-charged-less-when-columnar-format-is-used","customSortOrder":"2","content":"

With per query billing, Athena charges based the amount of data scanned per query. Compressing your data allows Athena to scan less data. Converting your data to columnar formats allows Athena to selectively read only required columns to process the data. Partitioning your data also allows Athena to restrict the amount of data scanned. This leads to cost savings and improved performance. For more details, review the Amazon Athena pricing page."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#pricing-and-billing","name":"Pricing and billing","namespaceId":"product-faqs#athena-faqs","description":"

Pricing and billing","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Why should I use Athena for Apache Spark?","id":"product-faqs#why-should-i-use-athena-for-spark","customSortOrder":"2","content":"

Use Athena for Apache Spark when you need an interactive, fully managed analytics experience and a tight integration with AWS services. You can use Spark to perform analytics in Athena using familiar, expressive languages such as Python and the growing environment of Spark packages. You can also enter their Spark applications through Athena APIs or into simplified notebooks in the Athena console, and begin running Spark applications under a second without setting up and tuning the underlying infrastructure. Like the SQL query capabilities of Athena, Athena offers a fully managed Spark experience and handles the performance tuning, machine configurations, and software patching automatically so that you do not need to worry about keeping current with version upgrades. Also, Athena is tightly integrated with other analytics services in the AWS system such as Data Catalog. Therefore, you can create Spark applications on data in S3 data lakes by referencing tables from your Data Catalog."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#athena-for-apache-spark","name":"Athena for Apache Spark","namespaceId":"product-faqs#athena-faqs","description":"

Amazon Athena for Apache Spark","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Can I train my ML model using Athena?","id":"product-faqs#train-ml-model-in-athena","customSortOrder":"3","content":"

You cannot train and deploy your ML models on SageMaker AI using Athena. You can train your ML model or use an existing pretrained model that is deployed on SageMaker AI using Athena. Read the documentation detailing training steps on SageMaker AI."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#machine-learning","name":"Machine learning","namespaceId":"product-faqs#athena-faqs","description":"

Machine learning","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Which data sources are supported?","id":"product-faqs#data-sources-supported","customSortOrder":"3","content":"

Athena provides built-in connectors to 30 popular AWS, on-premises, and other cloud data stores, including Amazon Redshift, Amazon DynamoDB, Google BigQuery, Google Cloud Storage, Azure Synapse, Azure Data Lake Storage, Snowflake, and SAP Hana. You can use these connectors to enable SQL analytics use cases on structured, semistructured, object, graph, time series, and other data storage types. For a list of supported sources, see Using Athena data source connectors. \n


You can also use the Athena data connector SDK to create a custom data source connector and query it with Athena. Get started by reviewing 
the documentation and example connector implementation."},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#federated-query","name":"Federated query","namespaceId":"product-faqs#athena-faqs","description":"

Federated query","metadata":{}}]}},{"fields":{"topic":"How does Athena for SQL store table definitions and schema?","id":"product-faqs#how-does-athena-store-table-definitions-schema","customSortOrder":"3","content":"

Athena for SQL uses a managed AWS Glue Data Catalog to store information and schemas about the databases and tables that you create for your data stored in S3. In Regions where AWS Glue is available, you can upgrade to using the Data Catalog with Athena. In Regions where AWS Glue is not available, Athena uses an internal catalog. \n

  \n

You can modify the catalog using DDL statements or through the AWS Management Console. Any schemas that you define are automatically saved unless you explicitly delete them. Athena uses schema-on-read technology, which means that your table definitions are applied to your data in S3 when queries are being applied. There’s no data loading or transformation required. You can delete table definitions and schema without impacting the underlying data stored in S3."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#athena-for-sql","name":"Amazon Athena for SQL","namespaceId":"product-faqs#athena-faqs","description":"

Tag to identify Amazon Athena for SQL","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"When should I use Amazon EMR versus Athena?","id":"product-faqs#when-should-i-use-amazon-emr-vs-athena","customSortOrder":"3","content":"

Amazon EMR goes far beyond just running SQL queries. With Amazon EMR, you can run various scale-out data processing tasks for applications, such as machine learning (ML), graph analytics, data transformation, streaming data, and virtually anything that you can code. Use Amazon EMR if you use custom code to process and analyze large datasets with the latest big data processing frameworks, such as Apache HBase, Spark, Hadoop, or Presto. Amazon EMR gives you full control over the configuration of your clusters and the software installed on them. \n

You should use Athena if you want to run interactive SQL queries against data on S3 without having to manage any infrastructure or clusters."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#when-to-use-athena-vs-other-big-data-services","name":"When to use Athena vs other big data services","namespaceId":"product-faqs#athena-faqs","description":"

When to use Athena vs other big data services","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Does Athena support other business intelligence (BI) tools and SQL clients?","id":"product-faqs#business-intelligence-tools-support","customSortOrder":"3","content":"

Yes. Athena comes with an ODBC and JDBC driver that you can use with other BI tools and SQL clients. Learn more about using an ODBC or JDBC driver with Athena. "},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#querying-data-formats-and-multiclouds","name":"Querying, data formats, and multicloud","namespaceId":"product-faqs#athena-faqs","description":"

Querying, data formats, and multicloud","metadata":{}}]}},{"fields":{"topic":"How do I get started with Athena?","id":"product-faqs#how-do-i-get-started-with-athena","customSortOrder":"3","content":"

To get started with Athena, log in to the AWS Management Console for Athena and create your schema by writing Data Definition Language (DDL) statements on the console or by using a create table wizard. You can then start querying data using a built-in query editor. Athena queries data directly from S3, so there’s no loading required."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#general","name":"General","namespaceId":"product-faqs#athena-faqs","description":"

General","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Which kinds of data types does Athena support?","id":"product-faqs#data-types-supported-by-athena","customSortOrder":"3","content":"

Athena supports both simple data types, such as INTEGER, DOUBLE, and VARCHAR, and complex data types, such as MAPS, ARRAY, and STRUCT.  "},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#creating-tables-date-formats-and-partitions","name":"Creating tables, data formats and partitions","namespaceId":"product-faqs#athena-faqs","description":"

Creating tables, data formats, and partitions","metadata":{}}]}},{"fields":{"topic":"Is Athena highly available?","id":"product-faqs#is-athena-highly-available","customSortOrder":"3","content":"

Yes. Athena is highly available and runs queries using compute resources across multiple facilities, automatically routing queries appropriately if a particular facility is unreachable. Athena uses S3 as its underlying data store, making your data highly available and durable. S3 provides durable infrastructure to store important data. Your data is redundantly stored across multiple facilities and multiple devices in each facility."},"metadata":{"tags":[{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}},{"id":"product-faqs#athena-faqs#security-and-availability","name":"Security and availability","namespaceId":"product-faqs#athena-faqs","description":"

Security and availability","metadata":{}}]}},{"fields":{"topic":"How do I lower my costs?","id":"product-faqs#how-do-i-lower-my-costs","customSortOrder":"3","content":"

With per query billing, you can save 30% to 90% per query and get better performance by compressing, partitioning, and converting your data into columnar formats. Each of these operations reduces the amount of data scanned and time required for execution. These operations are recommended when using Provisioned Capacity, too, because they often reduce the amount of time a query spends executing."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#pricing-and-billing","name":"Pricing and billing","namespaceId":"product-faqs#athena-faqs","description":"

Pricing and billing","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"How do I start working with Athena for Apache Spark?","id":"product-faqs#how-do-i-start-working-with-athena-for-spark","customSortOrder":"3","content":"

To get started with Athena for Apache Spark, you can start a notebook in the Athena console or start a session using the AWS Command Line Interface (CLI) or Athena API. In your notebook, you can start entering and shutting down Spark applications using Python. Athena also integrates with Data Catalog, so you can work with any data source referenced in the catalog, including data directly in S3 data lakes. Using notebooks, you can now query data from various sources, chain together multiple calculations, and visualize the results of their analyses. On your Spark applications, you can check the execution status and review logs and execution history in the Athena console."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#athena-for-apache-spark","name":"Athena for Apache Spark","namespaceId":"product-faqs#athena-faqs","description":"

Amazon Athena for Apache Spark","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Can I run inference on models deployed on other services such as Comprehend, Forecasting, or Models deployed on my own EC2 cluster?","id":"product-faqs#run-inference-on-models-deployed-on-other-services","customSortOrder":"4","content":"

Athena only supports invoking ML models deployed on SageMaker AI. We welcome feedback on what other services that you want to use with Athena. Email us your feedback to: [email protected]."},"metadata":{"tags":[{"id":"product-faqs#athena-faqs#machine-learning","name":"Machine learning","namespaceId":"product-faqs#athena-faqs","description":"

Machine learning","metadata":{}},{"id":"GLOBAL#product#athena","name":"Amazon Athena","namespaceId":"GLOBAL#product","description":"Amazon Athena","metadata":{}}]}},{"fields":{"topic":"Why should I upgrade to Data Catalog?","id":"product-faqs#why-should-i-upgrade-to-data-catalog","customSortOrder":"4","content":"

AWS Glue is a fully managed extract, transform, and load (ETL) service. AWS Glue has three main components: 1) a crawler that automatically scans your data sources, identifies data formats, and infers schemas, 2) a fully managed ETL service that allows you to transform and move data to various destinations, and 3) a Data Catalog that stores metadata information about databases and tables either stored in S3 or an ODBC- or JDBC-compliant data store. To use the benefits of AWS Glue, you must upgrade from using Athena’s internal Data Catalog to the Glue Data Catalog. \n

Benefits of upgrading to the Data Catalog include the following: \n