Vinh Quang-Vinh

🍵

data @ Stripe

36 followers · 53 following

Canada

Achievements

Stars

🚀 Data Engineering

21 repositories

GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Java 386 201 Updated Feb 14, 2025

DataExpert-io / data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 26,603 5,443 Updated Jan 6, 2025

apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.

Java 7,995 4,292 Updated Feb 17, 2025

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

Python 8,142 818 Updated Feb 15, 2025

dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.

Python 12,532 1,590 Updated Feb 18, 2025

PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

Python 18,286 1,709 Updated Feb 17, 2025

Cyclenerd / google-cloud-compute-machine-types

☁️ Choose the optimal Google Compute Engine machine type or instance in the many Google Cloud Platform regions

Perl 280 11 Updated Feb 14, 2025

apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Python 38,798 14,663 Updated Feb 18, 2025

meltano / meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Python 1,953 171 Updated Feb 17, 2025

singer-io / getting-started

This repository is a getting started guide to Singer.

Makefile 1,287 146 Updated Sep 3, 2024

scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python 54,195 10,652 Updated Feb 16, 2025

spotify / luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Python 18,099 2,411 Updated Feb 1, 2025