Skip to content
View rehan243's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report rehan243

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
rehan243/README.md

Profile Views

Rehan Malik

Senior AI/ML Engineer · Cloud Solution Architect (AWS) · Open to Opportunities

5+ years building production AI systems at enterprise scale — GenAI, LLMs, RAG, RLHF, Computer Vision, Voice AI, Cloud Architecture.

LinkedIn Kaggle Email


About Me

I’m a Senior AI/ML Engineer with 5+ years of hands-on experience shipping production AI systems across healthcare, finance, retail, media, and enterprise operations. I’ve worked with companies ranging from 10-person startups to 10,000+ employee enterprises like MARS, solving problems that move business metrics.

What I do best: Take AI from research to production. I’ve fine-tuned LLMs (LLaMA, Mistral) with LoRA/QLoRA, built RLHF pipelines with PPO, architected RAG systems over 2TB+ corpora, deployed real-time voice infrastructure handling 500+ concurrent calls, and shipped fraud detection models processing applications in real-time — all on AWS/GCP at scale.

What I’m looking for: Senior AI/ML Engineer, Staff ML Engineer, or Lead AI Engineer roles where I can build and ship production AI systems.

B.S. Computer Science (COMSATS University Islamabad, 2016–2020)


What I’ve Built

The work I’m most proud of — production systems processing real data, serving real users, driving real business impact.

Voice AI Infrastructure

Real-time concurrent voice processing with zero-latency ingestion engines

  • Built voice-to-data pipelines handling 500+ simultaneous calls using WebSockets, Apache Kafka, and streaming architectures
  • Developed gRPC microservices with C++ modules (CUDA, Eigen), reducing inference latency by 25%
  • Designed speech-to-text, sentiment analysis, and sales insights extraction from live audio streams

Fraud Detection AI Co-Pilot

Ensemble ML + GenAI explainability for financial services

  • Engineered 650+ predictive features from raw application data — behavioral anomalies, timing patterns, identity verification signals
  • Built ensemble model (XGBoost + Isolation Forest) achieving 50% fraud detection on holdout test sets
  • Discovered 3 applicant personas via unsupervised clustering (UMAP + HDBSCAN) — “Digital Ghost” persona has 70% fraud concentration
  • Implemented GenAI-powered explainable PDF reports via Amazon Bedrock translating SHAP values into plain English

Enterprise RAG Pipelines

Knowledge retrieval across 2TB+ structured and unstructured data

  • Architected multi-index retrieval (FAISS + ChromaDB + PG-Vector) with cross-encoder re-ranking
  • Built hallucination detection and citation tracking for grounded LLM responses
  • Deployed on AWS SageMaker with auto-scaling — 40% cost reduction vs hosted API models

LLM Fine-Tuning & RLHF

Parameter-efficient fine-tuning and human alignment for production LLMs

  • Fine-tuned LLaMA-2, Mistral with LoRA, QLoRA, PEFT — served via VLLM with CUDA optimization
  • Built full RLHF pipeline: SFT → Reward Modeling → PPO optimization with KL divergence constraints
  • Achieved 68% win rate vs SFT baseline and 96% safety compliance

Autonomous AI Agents

Multi-agent systems executing complex workflows without human intervention

  • Built 8+ specialized agents for insurance underwriting, multilingual caregiving (100+ languages), content generation, and admissions automation
  • LangChain Agent orchestration connecting LLMs to databases, APIs, and messaging platforms
  • Reduced processing time by 50% for student admissions workflows

Computer Vision at Scale

Object detection and digital avatar generation

  • BiiView: Real-time object detection using Meta AI’s Segment Anything Model (SAM) — 90% accuracy across 11M+ images and 1.1B+ masks
  • Digital People Platform: Hyper-realistic talking avatars with SadTalker + SpeechT5 TTS — 70% realism improvement, 30% user satisfaction increase
  • KYC Platform: Identity verification with OpenCV + AI — 99.9% accuracy, 50% faster document processing

Professional Experience

Role Company Period Highlights
Senior ML/AI Engineer Verticiti Mar 2024 – Present RAG pipelines (2TB+), LLM fine-tuning (LoRA/QLoRA), agentic workflows, C++ inference optimization, SAM object detection at scale.
Senior Generative AI Engineer MARS (10K+ employees) Oct 2024 – Jan 2026 Led $1M+ GenAI enterprise transformation. RAG architectures, LLM orchestration, multi-agent frameworks for regulated industries.
Cloud Solution Architect Cloud Kinetics USA Aug 2024 – Jan 2026 Designed cloud-native AI solutions on AWS, Azure, GCP for enterprise clients. ETL/ELT, data migration, real-time pipelines.
Senior AI Engineer Reallytics.ai Oct 2022 – Jan 2026 Voice AI infra (500+ calls), fraud detection, autonomous agents, RLHF frameworks, cloud architecture on AWS/GCP.
Senior ML Engineer Afiniti Oct 2022 – Nov 2023 Production ETL at scale for $1M+ accounts, churn modeling, call routing optimization.
AI Product Engineer Afiniti Apr 2021 – Oct 2022 ML pipelines for call-routing, feature engineering on millions of daily records, production monitoring.
Python Engineer MeryCure May 2020 – Apr 2021 IoT data pipelines (1000+ devices), anomaly detection, predictive maintenance, Power BI dashboards.

Featured Projects

Sentinel AI — Fraud Detection Ensemble XGBoost + Isolation Forest with 650+ features and GenAI explainability via Amazon Bedrock.

Python XGBoost AWS Bedrock SageMaker

Voice-AI-Platform Real-time voice processing — 500+ concurrent calls, WebSockets, Kafka, gRPC/C++.

Python Kafka gRPC C++ AWS

BiiView — Object Detection Meta AI SAM for video object detection — 90% accuracy across 11M+ images.

Python SAM OpenCV PyTorch

RAG-Enterprise-Search Enterprise RAG with multi-index fusion, re-ranking, and hallucination detection.

Python LangChain FAISS ChromaDB

LLM-Fine-Tuning-LoRA Fine-tuning LLaMA/Mistral with LoRA, QLoRA, PEFT — 40% cost reduction vs hosted APIs.

Python HuggingFace VLLM CUDA

RLHF-LLM-Optimization Full RLHF pipeline — SFT, reward modeling, PPO with KL constraints.

Python PyTorch HuggingFace TRL

Digital People Platform Talking avatars with SadTalker + SpeechT5 TTS — 70% realism improvement.

Python SadTalker OpenAI PyTorch

Agentic-AI-Workflows Autonomous AI agents for enterprise automation with LangChain orchestration.

Python LangChain OpenAI FastAPI

View all repositories →


Kaggle — Research & Technical Notebooks

Hands-on explorations, architecture deep-dives, and production-tested techniques — published on Kaggle.

🤖 Agentic AI: Multi-Agent Orchestration from Scratch Building a multi-agent system with tool registries, planning loops, and guardrails — framework-agnostic patterns from production.

🔌 LLM Function Calling and Tool Use: Complete Guide End-to-end function calling — schema design, validation, chaining, error recovery, and production deployment patterns.

🔍 Advanced RAG: Production Retrieval Guide Multi-query RAG, hybrid search, cross-encoder re-ranking, hallucination detection — beyond basic retrieve-and-generate.

🎯 Prompt Engineering That Actually Works (2026) Chain-of-thought, few-shot, self-consistency, structured output — real techniques with measured results.

👁️ Multimodal AI: Vision-Language Pipeline Vision encoders, cross-attention fusion, image captioning, visual QA — building multimodal systems from components.

💳 Fraud Detection: XGBoost + Isolation Forest Ensemble Ensemble anomaly detection with SHAP explainability, t-SNE visualization, and DBSCAN clustering on imbalanced data.

💬 Sentiment Analysis: NLP Pipeline Comparison TF-IDF vs BERT vs DistilBERT — benchmarking classical and transformer approaches on real text data.

📚 RAG Pipeline: LangChain + FAISS for Document QA End-to-end retrieval-augmented generation with chunk strategies, embedding models, and answer grounding.

🧬 LLM Fine-Tuning: LoRA and QLoRA Guide Parameter-efficient fine-tuning walkthrough — LoRA, QLoRA, PEFT with memory profiling and serving benchmarks.

📈 Time Series: XGBoost Forecasting Feature engineering for temporal data — lag features, rolling stats, calendar effects, walk-forward validation.

🚢 Titanic: Stacking Ensemble Pipeline — Advanced stacking with cross-validated base learners, meta-learner optimization, and feature engineering.

👉 View all notebooks on Kaggle →

Featured Writeups & Datasets

Technical writeups published as Kaggle Datasets — production insights, benchmarks, and reference architectures.

Writeup What’s Inside
Agentic AI Tool Schemas: Production Patterns 50+ tool/function schemas, 8 agent configs, benchmark data from 500 agent executions
RAG Evaluation Benchmark 2026 1,000 QA pairs with human-annotated relevance scores across 50 retrieval configs
LLM Prompt Engineering Templates 100+ prompt templates with A/B test results from 200 production experiments
Fraud Detection: Feature Engineering Guide 650+ feature catalog, interaction analysis, and 3 fraud persona profiles
ML System Design Patterns: Production 40+ patterns, 25+ anti-patterns, decision frameworks for production ML

Tech Stack

Languages & Frameworks

Python C++ PyTorch TensorFlow scikit--learn OpenCV FastAPI Flask

Generative AI & LLMs

LangChain OpenAI Claude HuggingFace VLLM FAISS ChromaDB Pinecone

Cloud & Infrastructure

AWS SageMaker Bedrock Azure GCP Docker Kubernetes Terraform CUDA

Data Engineering

Kafka PySpark Airflow PostgreSQL MongoDB Redis DynamoDB gRPC WebSockets


Education & Certifications

B.S. Computer Science COMSATS University Islamabad, 2016–2020
Foundations: Data, Data, Everywhere Google
PostgreSQL: Advanced Queries LinkedIn Learning
SQL Essential Training LinkedIn Learning


📰 Latest AI Research Articles

Auto-generated articles with AI-crafted images — published daily to AI-Engineering-Notes

Llm Fine Tuning At Scale With Lora

Llm Fine Tuning At Scale With Lora
2026-04-06

📚 View all articles →


⚡ Recent Activity

💬 Commented on Widen asset format supports in dottxt-ai/outlines (2026-04-06)

⭐ Starred oceanbase/pyseekdb (2026-04-06)

⭐ Starred AlayaDB-AI/AlayaLite (2026-04-06)

⭐ Starred Pometry/Raphtory (2026-04-06)

⭐ Starred unum-cloud/USearch (2026-04-06)

💬 Commented on Token Safety Tool for DeFi Multi-Agent Workflows in FoundationAgents/MetaGPT (2026-04-06)

📝 Opened issue [Feature] Add real-time streaming evaluation for production in vibrantlabsai/ragas (2026-04-06)

💬 Commented on [Feature Request] Saving GPU-Translated States for Fast CPU- in facebookresearch/faiss (2026-04-06)


🔬 Currently Researching

Topics discovered daily by a multi-model AI research engine (GPT-4.1, Grok-3, DeepSeek R1, Llama-4)

Research engine discovering trending topics...


📌 Latest Code Snippets

📌 Prompt Version Control & A/B Testing Registry (Python) (2026-04-06)

📌 Webhook Event Processor for ML Model Alerts (Python) (2026-04-06)

📌 Webhook Event Processor for ML Model Alerts (Python) (2026-04-06)

🤖 Profile auto-updated on 2026-04-06 10:49 UTC

GitHub Stats

GitHub Stats Top Languages


Currently open to Senior AI/ML Engineer, Staff ML Engineer, or Lead AI Engineer roles.
If you’re building production AI systems and need someone who ships — let’s talk.

Pinned Loading

  1. Voice-AI-Platform Voice-AI-Platform Public

    Real-time voice AI infrastructure — 500+ concurrent calls, WebSockets, Apache Kafka, gRPC/C++ with CUDA. Speech-to-text, sentiment analysis, sales insights.

    Python

  2. Agentic-AI-Workflows Agentic-AI-Workflows Public

    Production AI Agents for enterprise automation — 8+ specialized agents using LangChain, OpenAI function calling, and FastAPI. Multi-agent orchestration, tool use, planning loops, guardrails.

  3. BiiView-Object-Detection BiiView-Object-Detection Public

    Real-time object detection with Meta AI Segment Anything Model (SAM) — 90% accuracy across 11M+ images and 1.1B+ segmentation masks.

  4. Digital-People-Platform Digital-People-Platform Public

    Hyper-realistic talking avatars — SadTalker lip-sync + Microsoft SpeechT5 TTS + OpenAI conversational AI. 70% realism improvement.

  5. LLM-Fine-Tuning-LoRA LLM-Fine-Tuning-LoRA Public

    Fine-tuning LLaMA-2, Mistral with LoRA, QLoRA, PEFT — 40% cost reduction vs hosted APIs. VLLM serving with CUDA optimization on AWS SageMaker.

    Python

  6. RAG-Enterprise-Search RAG-Enterprise-Search Public

    Production RAG pipeline — enterprise knowledge retrieval across 2TB+ data using LangChain, FAISS, ChromaDB, PG-Vector with cross-encoder re-ranking. Deployed on AWS SageMaker.

    Python