Rehan Malik rehan243

Rehan Malik

Senior AI/ML Engineer · Cloud Solution Architect (AWS) · Open to Opportunities

5+ years building production AI systems at enterprise scale — GenAI, LLMs, RAG, RLHF, Computer Vision, Voice AI, Cloud Architecture.

About Me

I’m a Senior AI/ML Engineer with 5+ years of hands-on experience shipping production AI systems across healthcare, finance, retail, media, and enterprise operations. I’ve worked with companies ranging from 10-person startups to 10,000+ employee enterprises like MARS, solving problems that move business metrics.

What I do best: Take AI from research to production. I’ve fine-tuned LLMs (LLaMA, Mistral) with LoRA/QLoRA, built RLHF pipelines with PPO, architected RAG systems over 2TB+ corpora, deployed real-time voice infrastructure handling 500+ concurrent calls, and shipped fraud detection models processing applications in real-time — all on AWS/GCP at scale.

What I’m looking for: Senior AI/ML Engineer, Staff ML Engineer, or Lead AI Engineer roles where I can build and ship production AI systems.

B.S. Computer Science (COMSATS University Islamabad, 2016–2020)

What I’ve Built

The work I’m most proud of — production systems processing real data, serving real users, driving real business impact.

Voice AI Infrastructure

Real-time concurrent voice processing with zero-latency ingestion engines

Built voice-to-data pipelines handling 500+ simultaneous calls using WebSockets, Apache Kafka, and streaming architectures
Developed gRPC microservices with C++ modules (CUDA, Eigen), reducing inference latency by 25%
Designed speech-to-text, sentiment analysis, and sales insights extraction from live audio streams

Fraud Detection AI Co-Pilot

Ensemble ML + GenAI explainability for financial services

Engineered 650+ predictive features from raw application data — behavioral anomalies, timing patterns, identity verification signals
Built ensemble model (XGBoost + Isolation Forest) achieving 50% fraud detection on holdout test sets
Discovered 3 applicant personas via unsupervised clustering (UMAP + HDBSCAN) — “Digital Ghost” persona has 70% fraud concentration
Implemented GenAI-powered explainable PDF reports via Amazon Bedrock translating SHAP values into plain English

Enterprise RAG Pipelines

Knowledge retrieval across 2TB+ structured and unstructured data

Architected multi-index retrieval (FAISS + ChromaDB + PG-Vector) with cross-encoder re-ranking
Built hallucination detection and citation tracking for grounded LLM responses
Deployed on AWS SageMaker with auto-scaling — 40% cost reduction vs hosted API models

LLM Fine-Tuning & RLHF

Parameter-efficient fine-tuning and human alignment for production LLMs

Fine-tuned LLaMA-2, Mistral with LoRA, QLoRA, PEFT — served via VLLM with CUDA optimization
Built full RLHF pipeline: SFT → Reward Modeling → PPO optimization with KL divergence constraints
Achieved 68% win rate vs SFT baseline and 96% safety compliance

Autonomous AI Agents

Multi-agent systems executing complex workflows without human intervention

Built 8+ specialized agents for insurance underwriting, multilingual caregiving (100+ languages), content generation, and admissions automation
LangChain Agent orchestration connecting LLMs to databases, APIs, and messaging platforms
Reduced processing time by 50% for student admissions workflows

Computer Vision at Scale

Object detection and digital avatar generation

BiiView: Real-time object detection using Meta AI’s Segment Anything Model (SAM) — 90% accuracy across 11M+ images and 1.1B+ masks
Digital People Platform: Hyper-realistic talking avatars with SadTalker + SpeechT5 TTS — 70% realism improvement, 30% user satisfaction increase
KYC Platform: Identity verification with OpenCV + AI — 99.9% accuracy, 50% faster document processing

Professional Experience

Role	Company	Period	Highlights
Senior ML/AI Engineer	Verticiti	Mar 2024 – Present	RAG pipelines (2TB+), LLM fine-tuning (LoRA/QLoRA), agentic workflows, C++ inference optimization, SAM object detection at scale.
Senior Generative AI Engineer	MARS (10K+ employees)	Oct 2024 – Jan 2026	Led $1M+ GenAI enterprise transformation. RAG architectures, LLM orchestration, multi-agent frameworks for regulated industries.
Cloud Solution Architect	Cloud Kinetics USA	Aug 2024 – Jan 2026	Designed cloud-native AI solutions on AWS, Azure, GCP for enterprise clients. ETL/ELT, data migration, real-time pipelines.
Senior AI Engineer	Reallytics.ai	Oct 2022 – Jan 2026	Voice AI infra (500+ calls), fraud detection, autonomous agents, RLHF frameworks, cloud architecture on AWS/GCP.
Senior ML Engineer	Afiniti	Oct 2022 – Nov 2023	Production ETL at scale for $1M+ accounts, churn modeling, call routing optimization.
AI Product Engineer	Afiniti	Apr 2021 – Oct 2022	ML pipelines for call-routing, feature engineering on millions of daily records, production monitoring.
Python Engineer	MeryCure	May 2020 – Apr 2021	IoT data pipelines (1000+ devices), anomaly detection, predictive maintenance, Power BI dashboards.

Featured Projects

Sentinel AI — Fraud Detection Ensemble XGBoost + Isolation Forest with 650+ features and GenAI explainability via Amazon Bedrock.

Python XGBoost AWS Bedrock SageMaker

Voice-AI-Platform Real-time voice processing — 500+ concurrent calls, WebSockets, Kafka, gRPC/C++.

Python Kafka gRPC C++ AWS

BiiView — Object Detection Meta AI SAM for video object detection — 90% accuracy across 11M+ images.

Python SAM OpenCV PyTorch

RAG-Enterprise-Search Enterprise RAG with multi-index fusion, re-ranking, and hallucination detection.

Python LangChain FAISS ChromaDB

LLM-Fine-Tuning-LoRA Fine-tuning LLaMA/Mistral with LoRA, QLoRA, PEFT — 40% cost reduction vs hosted APIs.

Python HuggingFace VLLM CUDA

RLHF-LLM-Optimization Full RLHF pipeline — SFT, reward modeling, PPO with KL constraints.

Python PyTorch HuggingFace TRL

Digital People Platform Talking avatars with SadTalker + SpeechT5 TTS — 70% realism improvement.

Python SadTalker OpenAI PyTorch

Agentic-AI-Workflows Autonomous AI agents for enterprise automation with LangChain orchestration.

Python LangChain OpenAI FastAPI

View all repositories →

Kaggle — Research & Technical Notebooks

Hands-on explorations, architecture deep-dives, and production-tested techniques — published on Kaggle.

🤖 Agentic AI: Multi-Agent Orchestration from Scratch Building a multi-agent system with tool registries, planning loops, and guardrails — framework-agnostic patterns from production.	🔌 LLM Function Calling and Tool Use: Complete Guide End-to-end function calling — schema design, validation, chaining, error recovery, and production deployment patterns.
🔍 Advanced RAG: Production Retrieval Guide Multi-query RAG, hybrid search, cross-encoder re-ranking, hallucination detection — beyond basic retrieve-and-generate.	🎯 Prompt Engineering That Actually Works (2026) Chain-of-thought, few-shot, self-consistency, structured output — real techniques with measured results.
👁️ Multimodal AI: Vision-Language Pipeline Vision encoders, cross-attention fusion, image captioning, visual QA — building multimodal systems from components.	💳 Fraud Detection: XGBoost + Isolation Forest Ensemble Ensemble anomaly detection with SHAP explainability, t-SNE visualization, and DBSCAN clustering on imbalanced data.
💬 Sentiment Analysis: NLP Pipeline Comparison TF-IDF vs BERT vs DistilBERT — benchmarking classical and transformer approaches on real text data.	📚 RAG Pipeline: LangChain + FAISS for Document QA End-to-end retrieval-augmented generation with chunk strategies, embedding models, and answer grounding.
🧬 LLM Fine-Tuning: LoRA and QLoRA Guide Parameter-efficient fine-tuning walkthrough — LoRA, QLoRA, PEFT with memory profiling and serving benchmarks.	📈 Time Series: XGBoost Forecasting Feature engineering for temporal data — lag features, rolling stats, calendar effects, walk-forward validation.
🚢 Titanic: Stacking Ensemble Pipeline — Advanced stacking with cross-validated base learners, meta-learner optimization, and feature engineering.

👉 View all notebooks on Kaggle →

Featured Writeups & Datasets

Technical writeups published as Kaggle Datasets — production insights, benchmarks, and reference architectures.

Writeup	What’s Inside
Agentic AI Tool Schemas: Production Patterns	50+ tool/function schemas, 8 agent configs, benchmark data from 500 agent executions
RAG Evaluation Benchmark 2026	1,000 QA pairs with human-annotated relevance scores across 50 retrieval configs
LLM Prompt Engineering Templates	100+ prompt templates with A/B test results from 200 production experiments
Fraud Detection: Feature Engineering Guide	650+ feature catalog, interaction analysis, and 3 fraud persona profiles
ML System Design Patterns: Production	40+ patterns, 25+ anti-patterns, decision frameworks for production ML

Tech Stack

Languages & Frameworks

Generative AI & LLMs

Cloud & Infrastructure

Data Engineering

Education & Certifications


B.S. Computer Science	COMSATS University Islamabad, 2016–2020
Foundations: Data, Data, Everywhere	Google
PostgreSQL: Advanced Queries	LinkedIn Learning
SQL Essential Training	LinkedIn Learning

📰 Latest AI Research Articles

Auto-generated articles with AI-crafted images — published daily to AI-Engineering-Notes

Llm Fine Tuning At Scale With Lora
_2026-04-06

📚 View all articles →

⚡ Recent Activity

💬 Commented on Widen asset format supports in dottxt-ai/outlines _(2026-04-06)

⭐ Starred oceanbase/pyseekdb _(2026-04-06)

⭐ Starred AlayaDB-AI/AlayaLite _(2026-04-06)

⭐ Starred Pometry/Raphtory _(2026-04-06)

⭐ Starred unum-cloud/USearch _(2026-04-06)

💬 Commented on Token Safety Tool for DeFi Multi-Agent Workflows in FoundationAgents/MetaGPT _(2026-04-06)

📝 Opened issue [Feature] Add real-time streaming evaluation for production in vibrantlabsai/ragas _(2026-04-06)

💬 Commented on [Feature Request] Saving GPU-Translated States for Fast CPU- in facebookresearch/faiss _(2026-04-06)

🔬 Currently Researching

Topics discovered daily by a multi-model AI research engine (GPT-4.1, Grok-3, DeepSeek R1, Llama-4)

Research engine discovering trending topics...

📌 Latest Code Snippets

📌 Prompt Version Control & A/B Testing Registry (Python) _(2026-04-06)

📌 Webhook Event Processor for ML Model Alerts (Python) _(2026-04-06)

_{🤖 Profile auto-updated on 2026-04-06 10:49 UTC}

GitHub Stats

Currently open to Senior AI/ML Engineer, Staff ML Engineer, or Lead AI Engineer roles.
If you’re building production AI systems and need someone who ships — let’s talk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly