Datalake
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
Smart Automation Tool for building modern Data Lakes and Data Pipelines
lakeFS - Data version control for your data lake | Git for data
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activelo…
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licen…
Playbook of Kyuubi and Arctic Demo
Apache InLong - a one-stop, full-scenario integration framework for massive data
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
a curated list of awesome lakehouse frameworks, applications, etc
Universal solution for geospatial data tailored to data lakehouse systems for the first time in the industry
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Open Control Plane for Tables in Data Lakehouse
Open, Multi-modal Catalog for Data & AI