AWS Big Data Blog
Category: Amazon Athena
From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud
We are excited to announce that the dbt adapter for Amazon Athena is now officially supported in dbt Cloud. This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience. In this post, we discuss the advantages of dbt Cloud over dbt Core, common use cases, and how to get started with Amazon Athena using the dbt adapter.
Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone
Amazon DataZone recently announced the expansion of data analysis and visualization options for your project-subscribed data within Amazon DataZone using the Amazon Athena JDBC driver. In this post, you learn how the recent enhancements in Amazon DataZone facilitate a seamless connection with Tableau. By integrating Tableau with the comprehensive data governance capabilities of Amazon DataZone, we’re empowering data consumers to quickly and seamlessly explore and analyze their governed data.
Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more
Amazon DataZone now launched authentication support through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. This integration empowers data users to access and analyze governed data within Amazon DataZone using familiar tools, boosting both productivity and flexibility.
Analyze Amazon EMR on Amazon EC2 cluster usage with Amazon Athena and Amazon QuickSight
In this post, we guide you through deploying a comprehensive solution in your Amazon Web Services (AWS) environment to analyze Amazon EMR on EC2 cluster usage. By using this solution, you will gain a deep understanding of resource consumption and associated costs of individual applications running on your EMR cluster.
Enriching metadata for accurate text-to-SQL generation for Amazon Athena
In this post, we demonstrate the critical role of metadata in text-to-SQL generation through an example implemented for Amazon Athena using Amazon Bedrock. We discuss the challenges in maintaining the metadata as well as ways to overcome those challenges and enrich the metadata.
How AppsFlyer modernized their interactive workload by moving to Amazon Athena and saved 80% of costs
AppsFlyer develops a leading measurement solution focused on privacy, which enables marketers to gauge the effectiveness of their marketing activities and integrates them with the broader marketing world, managing a vast volume of 100 billion events every day. This post explores how AppsFlyer modernized their Audiences Segmentation product by using Amazon Athena.
Query AWS Glue Data Catalog views using Amazon Athena and Amazon Redshift
Glue Data Catalog views is a new feature of the AWS Glue Data Catalog that customers can use to create a common view schema and single metadata container that can hold view-definitions in different dialects that can be used across engines such as Amazon Redshift and Amazon Athena. In this blog post, we will show how you can define and query a Data Catalog view on top of open source table formats such as Iceberg across Athena and Amazon Redshift. We will also show you the configurations needed to restrict access to the underlying database and tables. To follow along, we have provided an AWS CloudFormation template.
Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK
The post illustrates the construction of a comprehensive CDC system, enabling the processing of CDC data sourced from Amazon Relational Database Service (Amazon RDS) for MySQL. Initially, we’re creating a raw data lake of all modified records in the database in near real time using Amazon MSK and writing to Amazon S3 as raw data. Later, we use an AWS Glue exchange, transform, and load (ETL) job for batch processing of CDC data from the S3 raw data lake.
How Zurich Insurance Group built a log management solution on AWS
This post is written in collaboration with Clarisa Tavolieri, Austin Rappeport and Samantha Gignac from Zurich Insurance Group. The growth in volume and number of logging sources has been increasing exponentially over the last few years, and will continue to increase in the coming years. As a result, customers across all industries are facing multiple […]
How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics
This post is co-written with Amit Gilad, Alex Dickman and Itay Takersman from Cloudinary. Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. Data-driven decisions lead to more effective responses to unexpected events, increase innovation and allow […]