Lead Data Platform Engineer | AI & GenAI Engineer | Data Platform Architect
Building enterprise-scale data platforms, production ML systems, and GenAI applications across AWS and GCP.
I am a Lead Data Platform Engineer with almost 10 years of experience across data engineering, machine learning, and software engineering. I design reliable data platforms, analytics systems, and GenAI solutions that move from prototype to production with governance, observability, and cost discipline built in.
My background includes Orion Engineered Carbons, Delivery Hero, Meta, Amazon, Goldman Sachs, NorthBay Solutions, and Teradata. Across these environments, I have built platforms handling billions of events per day, modernized cloud infrastructure, and shipped AI systems that support real business workflows.
| Area | Focus |
|---|---|
| Data Platforms | Lakehouse architectures, ELT and ETL pipelines, metadata-driven workflows, governance, lineage, and scalable analytics |
| Streaming Systems | Kafka, Kinesis, Spark, and event-driven pipelines for near real-time reporting and intelligent automation |
| GenAI Engineering | RAG systems, LLM-powered agents, internal copilots, chatbots, knowledge retrieval, and business workflow automation |
| ML and MLOps | Production ML pipelines, feature and training workflows, recommendation systems, and model deployment on cloud-native stacks |
- Data Engineering: Python, SQL, Spark, DBT, Airflow, Dagster, Kafka, Kinesis, BigQuery, Redshift, Snowflake, Trino
- Cloud and Platform: AWS, GCP, Terraform, Kubernetes, serverless architecture, GitHub Actions, observability, infrastructure automation
- GenAI and AI: Amazon Bedrock, OpenAI API, LangChain, HuggingFace, RAG pipelines, vector search, prompt workflows, LLM application delivery
- Delivery and Leadership: architecture design, technical leadership, stakeholder alignment, platform modernization, production hardening
- Modern lakehouse and open table architectures using Delta Lake and Apache Iceberg patterns
- Data quality, contracts, lineage, and trustworthy pipelines for platform-scale analytics
- Real-time data products powered by Kafka, Kinesis, Spark, and event-driven processing
- Production GenAI systems using RAG, knowledge bases, agents, and evaluation-driven iteration
- Multi-cloud data and AI infrastructure with Terraform, Kubernetes, and CI/CD automation
- ML and LLM platform engineering focused on reliability, governance, and cost-aware deployment
| Role | Company | Highlights |
|---|---|---|
| Lead Data Platform Engineer, AI Tech | Orion Engineered Carbons | Built a serverless AWS data platform, metadata-driven pipelines, and GenAI solutions with Bedrock and LangChain |
| Senior Data Engineer II | Delivery Hero | Delivered large-scale analytics on AWS and GCP, real-time APIs, and Terraform-based infrastructure modernization |
| Senior Data Engineer | Meta | Developed product analytics pipelines and large-scale distributed data infrastructure |
| Senior Data Engineer | Amazon | Built predictive analytics and serverless data and ML workflows with Redshift, Glue, Lambda, Athena, and SageMaker |
| Senior ML Engineer | Goldman Sachs | Built secure ML and data workflows in regulated enterprise environments |
| Data Scientist / ML Engineer / SDE II | NorthBay Solutions | Built ML systems, OCR automation, recommendation engines, and cloud-native data solutions |
- Enterprise Serverless Data Platform: lakehouse-style AWS platform for analytics across manufacturing, operations, and finance
- LLM-Powered AI Agents and Chatbots: GenAI solutions with Amazon Bedrock, LangChain, Streamlit, and retrieval workflows
- Multi-Stream Real-Time Data Systems: Kafka and Kinesis platforms supporting large-scale event processing and operational analytics
- Global Analytics on GCP: BigQuery, Spark, DBT, and Cloud Functions for international-scale data products
- Production ML Pipelines: SageMaker, TensorFlow, and PyTorch systems for predictive analytics and recommendation use cases
- Terraform-Based Modernization: standardized infrastructure delivery across AWS and GCP with reproducible deployments
- AWS Certified Data Engineer Associate
- Databricks Spark
- Databricks GenAI Fundamentals
- HashiCorp Terraform Associate
- Airflow
- GCP Cloud Engineer
- GCP Data Engineer
- AWS Certified Machine Learning Specialist
- AWS Certified Solutions Architect Associate
- AWS Certified Cloud Practitioner
Full badge list: Credly Profile
Data Platform: Airflow, DBT, Spark, Kafka, Kinesis, Delta Lake, Iceberg, BigQuery, Redshift, Snowflake
Cloud: AWS, GCP, Terraform, Kubernetes, Serverless, GitHub Actions
GenAI: Bedrock, OpenAI API, LangChain, HuggingFace, RAG, Vector Databases
ML: SageMaker, TensorFlow, PyTorch, MLflow, Kubeflow
Languages: Python, SQL, Scala, R, Bash, SPARQL
- Portfolio: shafay.deutschhier.com
- GitHub: github.com/shafaypro
- LinkedIn: pk.linkedin.com/in/imshafay
- Certifications: Credly




