Apache Iceberg is a distributed, community-driven, Apache 2.0-licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it is fast, efficient, and reliable at any scale and keeps records of how datasets change over time. Apache Iceberg offers easy integrations with popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive, Presto, and more.","sortDate":"2023-04-12","headlineUrl":"https://aws.amazon.com/what-is/apache-iceberg/?trk=faq_card","id":"faq-hub#what-is-apache-iceberg","category":"Analytics","primaryCTA":"https://portal.aws.amazon.com/gp/aws/developer/registration/index.html?pg=what_is_header","headline":"What is Apache Iceberg?"},"metadata":{"tags":[{"id":"GLOBAL#tech-category#analytics","name":"Analytics","namespaceId":"GLOBAL#tech-category","description":"Analytics","metadata":{}}]}}]},"metadata":{"auth":{},"testAttributes":{}},"context":{"page":{"locale":null,"site":null,"pageUrl":"https://aws.amazon.com/what-is/apache-iceberg/","targetName":null,"pageSlotId":null,"organizationId":null,"availableLocales":null},"environment":{"stage":"prod","region":"us-east-1"},"sdkVersion":"1.0.115"},"refMap":{"manifest.js":"289765ed09","what-is-header.js":"251923df8a","what-is-header.rtl.css":"ccf4035484","what-is-header.css":"ce47058367","what-is-header.css.js":"004a4704e8","what-is-header.rtl.css.js":"f687973e4f"},"settings":{"templateMappings":{"category":"category","headline":"headline","primaryCTA":"primaryCTA","primaryCTAText":"primaryCTAText","primaryBreadcrumbText":"primaryBreadcrumbText","primaryBreadcrumbURL":"primaryBreadcrumbURL"}}}
Apache Iceberg is a distributed, community-driven, Apache 2.0-licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it is fast, efficient, and reliable at any scale and keeps records of how datasets change over time. Apache Iceberg offers easy integrations with popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive, Presto, and more.","id":"seo-faq-pairs#what-is-apache-iceberg","customSort":"1"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#apache-iceberg","name":"apache-iceberg","namespaceId":"seo-faq-pairs#faq-collections","description":"
apache-iceberg","metadata":{}}]}},{"fields":{"faqQuestion":"What is a transactional data lake?","faqAnswer":"
A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. A data transaction is a series of data exchanges that are conducted in a single operation. For example, when a customer withdraws money from a bank account, the bank conducts several data exchanges at the same time in one data transaction, including verifying the account has sufficient balance, verifying identity, and debiting the withdrawal from the account. A transactional data lake is a type of data lake that not only stores data at scale but also supports transactional operations and ensures that data is accurate, consistent, and allows you to track how data and data structure changes over time. These properties are collectively known as Atomicity, Consistency, Isolation, and Durability (ACID): \n apache-iceberg","metadata":{}}]}},{"fields":{"faqQuestion":"What are the benefits of using Apache Iceberg?","faqAnswer":" Some of the key benefits of using Apache Iceberg for transactional data lakes include: \n apache-iceberg","metadata":{}}]}},{"fields":{"faqQuestion":"What are common use cases for Apache Iceberg?","faqAnswer":" Apache Iceberg is suited for many data lake use cases, including: \n apache-iceberg","metadata":{}}]}},{"fields":{"faqQuestion":"Who uses Apache Iceberg? ","faqAnswer":" Data engineers, data administrators, data analysts, and data scientists are among the personas that use Apache Iceberg. Data engineers and administrators can use Apache Iceberg to design and build scalable data storage systems. Data analysts and data scientists can use Apache Iceberg to analyze large datasets efficiently. ","id":"seo-faq-pairs#who-uses-apache-iceberg","customSort":"5"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#apache-iceberg","name":"apache-iceberg","namespaceId":"seo-faq-pairs#faq-collections","description":" apache-iceberg","metadata":{}}]}},{"fields":{"faqQuestion":"Why should you choose Apache Iceberg?","faqAnswer":" Apache Iceberg offers a fast, efficient way to process large datasets at scale. It brings the following benefits: \n apache-iceberg","metadata":{}}]}},{"fields":{"faqQuestion":"What AWS services support Iceberg?","faqAnswer":" Apache Iceberg supports popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive and Presto. AWS services such as Amazon Redshift, Amazon Athena, Amazon EMR, and AWS Glue, include native support for transactional data lake frameworks including Apache Iceberg. Apache Iceberg in combination with supported AWS services enable a transactional data lake, often based on storage in S3. \n apache-iceberg","metadata":{}}]}}]},"metadata":{"auth":{},"pagination":{"empty":false,"present":true},"testAttributes":{}},"context":{"page":{"locale":null,"site":null,"pageUrl":"https://aws.amazon.com/what-is/apache-iceberg/","targetName":null,"pageSlotId":null,"organizationId":null,"availableLocales":null},"environment":{"stage":"prod","region":"us-east-1"},"sdkVersion":"1.0.115"},"refMap":{"manifest.js":"3dea65b485","rt-faq.rtl.css":"75bc12ff4b","rt-faq.css":"b00bda11a1","rt-faq.css.js":"0af1d62724","rt-faq.js":"da177bdd5f","rt-faq.rtl.css.js":"a89cd83194"},"settings":{"templateMappings":{"question":"faqQuestion","answer":"faqAnswer"}}}
Apache Iceberg is a distributed, community-driven, Apache 2.0-licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it is fast, efficient, and reliable at any scale and keeps records of how datasets change over time. Apache Iceberg offers easy integrations with popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive, Presto, and more. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. A data transaction is a series of data exchanges that are conducted in a single operation. For example, when a customer withdraws money from a bank account, the bank conducts several data exchanges at the same time in one data transaction, including verifying the account has sufficient balance, verifying identity, and debiting the withdrawal from the account. A transactional data lake is a type of data lake that not only stores data at scale but also supports transactional operations and ensures that data is accurate, consistent, and allows you to track how data and data structure changes over time. These properties are collectively known as Atomicity, Consistency, Isolation, and Durability (ACID): Some of the key benefits of using Apache Iceberg for transactional data lakes include: Apache Iceberg is suited for many data lake use cases, including: Data engineers, data administrators, data analysts, and data scientists are among the personas that use Apache Iceberg. Data engineers and administrators can use Apache Iceberg to design and build scalable data storage systems. Data analysts and data scientists can use Apache Iceberg to analyze large datasets efficiently. Apache Iceberg offers a fast, efficient way to process large datasets at scale. It brings the following benefits: Apache Iceberg supports popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive and Presto. AWS services such as Amazon Redshift, Amazon Athena, Amazon EMR, and AWS Glue, include native support for transactional data lake frameworks including Apache Iceberg. Apache Iceberg in combination with supported AWS services enable a transactional data lake, often based on storage in S3. \n
\n
\n
\n
\n
What is Apache Iceberg?
What is a transactional data lake?
What are the benefits of using Apache Iceberg?
What are common use cases for Apache Iceberg?
Who uses Apache Iceberg?
Why should you choose Apache Iceberg?
What AWS services support Iceberg?