Skip to content
View shafaypro's full-sized avatar
🧑‍💻
Going to be in FAANG IA
🧑‍💻
Going to be in FAANG IA
  • https://www.linkedin.com/in/imshafay/
  • Berlin

Block or report shafaypro

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shafaypro/README.md

Shafay

Lead Data Platform Engineer | AI & GenAI Engineer | Data Platform Architect

Building enterprise-scale data platforms, production ML systems, and GenAI applications across AWS and GCP.

Portfolio GitHub LinkedIn Credly

Open to remote roles Almost 10 years experience 13+ certifications Billions of events per day AWS and GCP

Executive Summary

I am a Lead Data Platform Engineer with almost 10 years of experience across data engineering, machine learning, and software engineering. I design reliable data platforms, analytics systems, and GenAI solutions that move from prototype to production with governance, observability, and cost discipline built in.

My background includes Orion Engineered Carbons, Delivery Hero, Meta, Amazon, Goldman Sachs, NorthBay Solutions, and Teradata. Across these environments, I have built platforms handling billions of events per day, modernized cloud infrastructure, and shipped AI systems that support real business workflows.

What I Build

Area Focus
Data Platforms Lakehouse architectures, ELT and ETL pipelines, metadata-driven workflows, governance, lineage, and scalable analytics
Streaming Systems Kafka, Kinesis, Spark, and event-driven pipelines for near real-time reporting and intelligent automation
GenAI Engineering RAG systems, LLM-powered agents, internal copilots, chatbots, knowledge retrieval, and business workflow automation
ML and MLOps Production ML pipelines, feature and training workflows, recommendation systems, and model deployment on cloud-native stacks

High-Signal Strengths

  • Data Engineering: Python, SQL, Spark, DBT, Airflow, Dagster, Kafka, Kinesis, BigQuery, Redshift, Snowflake, Trino
  • Cloud and Platform: AWS, GCP, Terraform, Kubernetes, serverless architecture, GitHub Actions, observability, infrastructure automation
  • GenAI and AI: Amazon Bedrock, OpenAI API, LangChain, HuggingFace, RAG pipelines, vector search, prompt workflows, LLM application delivery
  • Delivery and Leadership: architecture design, technical leadership, stakeholder alignment, platform modernization, production hardening

Current Hot Topics I Work On

  • Modern lakehouse and open table architectures using Delta Lake and Apache Iceberg patterns
  • Data quality, contracts, lineage, and trustworthy pipelines for platform-scale analytics
  • Real-time data products powered by Kafka, Kinesis, Spark, and event-driven processing
  • Production GenAI systems using RAG, knowledge bases, agents, and evaluation-driven iteration
  • Multi-cloud data and AI infrastructure with Terraform, Kubernetes, and CI/CD automation
  • ML and LLM platform engineering focused on reliability, governance, and cost-aware deployment

Experience Snapshot

Role Company Highlights
Lead Data Platform Engineer, AI Tech Orion Engineered Carbons Built a serverless AWS data platform, metadata-driven pipelines, and GenAI solutions with Bedrock and LangChain
Senior Data Engineer II Delivery Hero Delivered large-scale analytics on AWS and GCP, real-time APIs, and Terraform-based infrastructure modernization
Senior Data Engineer Meta Developed product analytics pipelines and large-scale distributed data infrastructure
Senior Data Engineer Amazon Built predictive analytics and serverless data and ML workflows with Redshift, Glue, Lambda, Athena, and SageMaker
Senior ML Engineer Goldman Sachs Built secure ML and data workflows in regulated enterprise environments
Data Scientist / ML Engineer / SDE II NorthBay Solutions Built ML systems, OCR automation, recommendation engines, and cloud-native data solutions

Selected Platform and AI Work

  • Enterprise Serverless Data Platform: lakehouse-style AWS platform for analytics across manufacturing, operations, and finance
  • LLM-Powered AI Agents and Chatbots: GenAI solutions with Amazon Bedrock, LangChain, Streamlit, and retrieval workflows
  • Multi-Stream Real-Time Data Systems: Kafka and Kinesis platforms supporting large-scale event processing and operational analytics
  • Global Analytics on GCP: BigQuery, Spark, DBT, and Cloud Functions for international-scale data products
  • Production ML Pipelines: SageMaker, TensorFlow, and PyTorch systems for predictive analytics and recommendation use cases
  • Terraform-Based Modernization: standardized infrastructure delivery across AWS and GCP with reproducible deployments

Featured Repositories

CrackingMachineLearningInterview repository card PYSHA repository card

DeepLearningZerotoHero repository card

Certifications

  • AWS Certified Data Engineer Associate
  • Databricks Spark
  • Databricks GenAI Fundamentals
  • HashiCorp Terraform Associate
  • Airflow
  • GCP Cloud Engineer
  • GCP Data Engineer
  • AWS Certified Machine Learning Specialist
  • AWS Certified Solutions Architect Associate
  • AWS Certified Cloud Practitioner

Full badge list: Credly Profile

Core Stack

Data Platform: Airflow, DBT, Spark, Kafka, Kinesis, Delta Lake, Iceberg, BigQuery, Redshift, Snowflake
Cloud: AWS, GCP, Terraform, Kubernetes, Serverless, GitHub Actions
GenAI: Bedrock, OpenAI API, LangChain, HuggingFace, RAG, Vector Databases
ML: SageMaker, TensorFlow, PyTorch, MLflow, Kubeflow
Languages: Python, SQL, Scala, R, Bash, SPARQL

Links

GitHub stats

Top languages

Pinned Loading

  1. scikit-learn/scikit-learn scikit-learn/scikit-learn Public

    scikit-learn: machine learning in Python

    Python 65.9k 27k

  2. aws/amazon-sagemaker-examples aws/amazon-sagemaker-examples Public

    Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

    Jupyter Notebook 10.9k 7k

  3. RDFLib/rdflib RDFLib/rdflib Public

    RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.

    Python 2.4k 591

  4. CrackingMachineLearningInterview CrackingMachineLearningInterview Public

    A repository to prepare you for your machine learning interview, involving most of the questions asked by all the tech giants and local companies. Do this to Ace your Machine Learning Engineer Inte…

    HTML 607 125

  5. PYSHA PYSHA Public

    A Simple Virtual Assistant Build in Python 3.5

    Python 19 5

  6. DeepLearningZerotoHero DeepLearningZerotoHero Public

    A repository for Deep Learning projects, which includes complete preparation from Novice to Expert

    Jupyter Notebook 6 2