Skip to content
View Quang-Vinh's full-sized avatar
🍵
🍵
  • Canada

Block or report Quang-Vinh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🚀 Data Engineering

21 repositories

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Java 386 201 Updated Feb 14, 2025

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 26,603 5,443 Updated Jan 6, 2025

Apache Beam is a unified programming model for Batch and Streaming data processing.

Java 7,995 4,292 Updated Feb 17, 2025

🧙 Build, run, and manage data pipelines for integrating and transforming data.

Python 8,142 818 Updated Feb 15, 2025

An orchestration platform for the development, production, and observation of data assets.

Python 12,532 1,590 Updated Feb 18, 2025

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

Python 18,286 1,709 Updated Feb 17, 2025

☁️ Choose the optimal Google Compute Engine machine type or instance in the many Google Cloud Platform regions

Perl 280 11 Updated Feb 14, 2025

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Python 38,798 14,663 Updated Feb 18, 2025

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Python 1,953 171 Updated Feb 17, 2025

This repository is a getting started guide to Singer.

Makefile 1,287 146 Updated Sep 3, 2024

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python 54,195 10,652 Updated Feb 16, 2025

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Python 18,099 2,411 Updated Feb 1, 2025

PySpark test helper methods with beautiful error messages

Python 662 70 Updated Jan 15, 2025

Macros that generate dbt code

Makefile 523 113 Updated Jan 23, 2025

Utility functions for dbt projects.

Makefile 1,451 511 Updated Jan 24, 2025

This dbt package contains macros to support unit testing that can be (re)used across dbt projects.

Shell 432 79 Updated Feb 11, 2025

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Python 10,373 1,678 Updated Feb 17, 2025

Always know what to expect from your data.

Python 10,201 1,559 Updated Feb 17, 2025

Apache Iceberg

Java 6,901 2,374 Updated Feb 17, 2025

Apache Spark - A unified analytics engine for large-scale data processing

Scala 40,548 28,503 Updated Feb 18, 2025

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 10,846 3,096 Updated Feb 18, 2025