I’m a Machine Learning Engineer + Researcher currently pursuing my M.S. in Computer Science at NYU Courant (GPA: 4.0).
My work sits at the intersection of:
- LLM reasoning, retrieval & attention mechanisms
- Healthcare ML & safety for deployed clinical AI
- Document intelligence, VLMs, synthetic data generation
- Production ML systems & model monitoring
Working with Prof. Eunsol Choi on multilingual LLM retrieval:
- Improving in-context fact retrieval across 5 languages
- Modifying attention mechanisms in LLaMa-3.2-8B, Qwen-2.5-7B, Phi-3.5
- Achieved 15% retrieval gains with 30% lower KV-cache
Building production-grade safety systems for ML models powering clinical workflows across 23 hospitals.
- Designed drift detection pipelines (K–S, PSI, DeLong)
- Real-time monitoring with Prometheus + Grafana
- Extensive work with HIPAA-compliant datasets (EPIC COSMOS, OMOP CDM, Caboodle, Clarity)
- Co-authored NIH & PCORi grant proposals
Published at ICDAR 2025 (Oral, Top 2%).
- Generated 18k synthetic slides using novel LLM pipeline
- Boosted VLM performance by 13% mAP and 10% Recall@K
- HuggingFace model reached 500+ downloads
AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval
Maniyar, Trivedi et al.
🔗 Project: https://synslidegen.github.io
🔗 DOI: https://doi.org/10.1007/978-3-032-04614-7_11
🔗 Code: https://github.com/NerdyVisky/adaptive-gpu-hashtable
- Built a high-performance adaptive GPU hash table in C++/CUDA using cooperative groups and elected-lane atomics
- Achieved 21× faster inserts and 20× faster lookups, outperforming naïve GPU hashing at scale
- Implemented epoch-based dynamic resizing + compaction for non-blocking concurrency
- Sustains stable throughput on 100M+ operations, even at 0.99 load factor
- Designed Attention-Aware DPO improving multi-image VQA accuracy by 8.5%
- Applied AdaptVis for inference boosts → 10% over base model
- Built LLM-as-a-judge with Gemini-2.5-Pro
🔗 Code: https://github.com/harsh-sutariya/AA-DPO
🔗 Website: https://nerdyvisky.github.io/projects/AttnDPO/
- Refactored full codebase for faster, leaner execution
- Built vectorized dataloaders, added flash-attention, integrated vLLM
- Reduced inference time 2 hrs → 30 mins (4× faster)
🔗 Code: https://github.com/NerdyVisky/multilingual-retrieval-translation-heads
Python · C/C++ · R · SQL (Postgres, MySQL) · JavaScript · TypeScript · Bash/Zsh
PyTorch · TensorFlow · HuggingFace · LangChain
NumPy · Pandas · scikit-learn · Matplotlib
AWS · GCP · Azure · Databricks
Docker · Git · Redis · MongoDB
Prometheus · Grafana
Thanks for stopping by! Feel free to reach out if you're working on LLMs, retrieval, ML safety, or healthcare AI. 🚀
[Nov 2025] - I am looking for fulltime roles related to SDE/MLE and Applied Science based in the US starting Summer 2026. I am a US Permanent Resident (Green Card), and hence require no visa sponsorship. If you're hiring and like my work, feel free to connect on my email : [email protected]


