Apache Iceberg is a distributed, community-driven, Apache 2.0-licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it is fast, efficient, and reliable at any scale and keeps records of how datasets change over time. Apache Iceberg offers easy integrations with popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive, Presto, and more.","sortDate":"2023-04-12","headlineUrl":"https://aws.amazon.com/what-is/apache-iceberg/?trk=faq_card","id":"faq-hub#what-is-apache-iceberg","category":"Analytics","primaryCTA":"https://portal.aws.amazon.com/gp/aws/developer/registration/index.html?pg=what_is_header","headline":"What is Apache Iceberg?"},"metadata":{"tags":[{"id":"GLOBAL#tech-category#analytics","name":"Analytics","namespaceId":"GLOBAL#tech-category","description":"Analytics","metadata":{}}]}}]},"metadata":{"auth":{},"testAttributes":{}},"context":{"page":{"locale":null,"site":null,"pageUrl":"https://aws.amazon.com/what-is/apache-iceberg/","targetName":null,"pageSlotId":null,"organizationId":null,"availableLocales":null},"environment":{"stage":"prod","region":"us-east-1"},"sdkVersion":"1.0.115"},"refMap":{"manifest.js":"289765ed09","what-is-header.js":"251923df8a","what-is-header.rtl.css":"ccf4035484","what-is-header.css":"ce47058367","what-is-header.css.js":"004a4704e8","what-is-header.rtl.css.js":"f687973e4f"},"settings":{"templateMappings":{"category":"category","headline":"headline","primaryCTA":"primaryCTA","primaryCTAText":"primaryCTAText","primaryBreadcrumbText":"primaryBreadcrumbText","primaryBreadcrumbURL":"primaryBreadcrumbURL"}}}

Apache Iceberg is a distributed, community-driven, Apache 2.0-licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it is fast, efficient, and reliable at any scale and keeps records of how datasets change over time. Apache Iceberg offers easy integrations with popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive, Presto, and more.","id":"seo-faq-pairs#what-is-apache-iceberg","customSort":"1"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#apache-iceberg","name":"apache-iceberg","namespaceId":"seo-faq-pairs#faq-collections","description":"

apache-iceberg","metadata":{}}]}},{"fields":{"faqQuestion":"What is a transactional data lake?","faqAnswer":"

A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. A data transaction is a series of data exchanges that are conducted in a single operation. For example, when a customer withdraws money from a bank account, the bank conducts several data exchanges at the same time in one data transaction, including verifying the account has sufficient balance, verifying identity, and debiting the withdrawal from the account. A transactional data lake is a type of data lake that not only stores data at scale but also supports transactional operations and ensures that data is accurate, consistent, and allows you to track how data and data structure changes over time. These properties are collectively known as Atomicity, Consistency, Isolation, and Durability (ACID): \n

Next Steps on AWS

Check out additional product-related resources
Check out Analytics Services  
Sign up for a free account

Instant get access to the AWS Free Tier.

Sign up 
Start building in the console

Get started building in the AWS management console.

Sign in