🖐️ Interests: VLM, Document AI, Multi-modal Tasks (CV+NLP), OCR, CV, RL
- Programming: Python
- Frameworks & Libraries: PyTorch, OpenCV, TensorFlow
- Deployment & Serving: Docker, Triton, vLLM
- OCR: Scene Text Detection (STD), Scene Text Recognition (STR)
- NER: Named Entity Recognition
- TSR: Table Structure Recognition
- Detection: Strikethrough, Checkmark, Circled Number, Document Contour
- DLA: Document Layout Analysis
- LangGraph based VLM workflow
- Model deployment using Docker and Triton and vLLM
- API development with FastAPI
- Synthetic Data Generation (Image Processing)
- Data Annotation Management (Label Studio, LabelMe)
- Document AI Pipeline Design & Implementation
- PoC (Proof of Concept) Support
📄 My Papers
- Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation-2021.02, Journal of Sensors(SCIE)
- 시각-언어 이동을 위한 다중 모달 공동 임베딩과 역추적 탐색- Master's thesis
- AnoVid: A Deep Neural Network-Based Tool for Video Annotation-2020.08, Journal of KMMS
- Landmark-based Search for Vision-and-Language Navigation-2019.12 KSC Conference
- LVLN : A Landmark-Based Deep Neural Network Model for Vision-and-Language Navigation-2019.09, Journal of KIPS(KTSDE)
- Real-Time Visual Grounding for Natural Language Instructions with Deep Neural Network-2019.05, KIPS Conference
- Deep Reinforcement Learning for Optimizing Visual Questions-2018.09, Journal of ICROS
- Deep Reinforcement Learning for Visual Dialogue Agents-2018.05, KIPS Conference

