Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang, Sifei Liu, Subhashree Radhakrishnan, Seon Joo Kim, Yu-Chiang Frank Wang, Ryo Hachiuma
arXiv
project
NVILA: Efficient Frontier Visual Language Models
Zhijian Liu*, Ligeng Zhu*, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Vishwesh Nath, Jinyi Hu, Sifei Liu, Ranjay Krishna, Daguang Xu, Xiaolong Wang, Pavlo Molchanov, Jan Kautz, Hongxu Yin*, Song Han*, Yao Lu*
arXiv
code
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Fuxiao Liu*, Min Shi*, Shihao Wang, Shijia Liao, Subhashree Radhakrishnan, De-An Huang, Hongxu Yin, Karan Sapra, Yaser Yacoob, Humphrey Shi, Bryan Catanzaro, Andrew Tao, Jan Kautz, Zhiding Yu*, Guilin Liu*
International Conference on Learning Representations (ICLR), 2025
arXiv
code
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar
International Conference on Learning Representations (ICLR), 2025
arXiv
project
code
X-VILA: Cross-Modality Alignment for Large Language Model
Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin
arXiv
ARDuP: Active Region Video Diffusion for Universal Policies
Shuaiyi Huang, Mara Levy, Zhenyu Jiang, Anima Anandkumar, Yuke Zhu, Linxi Fan, De-An Huang, Abhinav Shrivastava
International Conference on Intelligent Robots and Systems (IROS), 2024
arXiv
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz
European Conference on Computer Vision (ECCV), 2024
arXiv
code
PerAda: Parameter-Efficient and Generalizable Federated Learning Personalization with Guarantees
Chulin Xie, De-An Huang, Wenda Chu, Daguang Xu, Chaowei Xiao, Bo Li, Anima Anandkumar
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
arXiv
What is Point Supervision Worth in Video Instance Segmentation?
Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkuma
IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024
arXiv
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar
International Conference on Learning Representations (ICLR), 2024
project
Eureka: Human-Level Reward Design via Coding Large Language Models
Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar
International Conference on Learning Representations (ICLR), 2024
project
Differentially Private Video Activity Recognition
Zelun Luo, Yuliang Zou, Yijin Yang, Zane Durante, De-An Huang, Zhiding Yu, Chaowei Xiao, Li Fei-Fei, Anima Anandkumar
Winter Conference on Applications of Computer Vision (WACV), 2024
arXiv
Deep Multimodal Fusion for Surgical Feedback Classification
Rafal Kocielnik, Elyssa Y. Wong, Timothy N. Chu, Lydia Lin, De-An Huang, Jiayun Wang, Anima Anandkumar, Andrew J. Hung
Machine Learning for Health (ML4H), 2023 (Best Proceedings Paper)
arXiv
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Anand Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Mohammad Shoeybi, Ming-Yu Liu, Yuke Zhu, Bryan Catanzaro, Chaowei Xiao*, Anima Anandkumar*
Empirical Methods in Natural Language Processing (EMNLP), 2023
arXiv
I²SB: Image-to-Image Schrödinger Bridge
Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A Theodorou, Weili Nie†, and Anima Anandkumar†
International Conference on Machine Learning (ICML), 2023
project
code
Dr-Fairness: Dynamic Data Ratio Adjustment for Fair Training on Real and Generated Data
Yuji Roh, Weili Nie, De-An Huang, Steven Euijong Whang, Arash Vahdat, and Anima Anandkumar
Transactions on Machine Learning Research (TMLR), 2023
code
Capturing Fine-grained Details for Video-based Automation of Suturing Skills Assessment
Andrew J. Hung, Richard Bao, Idris O. Sunmola, De-An Huang, Jessica H. Nguyen, Anima Anandkumar
International Journal of Computer Assisted Radiology and Surgery (IJCARS), 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan, Guanzhi Wang*, Yunfan Jiang*, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu†, Anima Anandkumar†
Neural Information Processing Systems (NeurIPS) Dataset & Benchmark, 2022
arXiv
project
code
video
blog
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
De-An Huang, Zhiding Yu, Anima Anandkumar
Neural Information Processing Systems (NeurIPS), 2022
arXiv
code
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao
Neural Information Processing Systems (NeurIPS), 2022
arXiv
project
Pre-Trained Language Models for Interactive Decision-Making
Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu
Neural Information Processing Systems (NeurIPS), 2022
arXiv
project
code
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
Jiankai Sun, De-An Huang, Bo Lu, Yun-Hui Liu, Bolei Zhou, Animesh Garg
IEEE Robotics and Automation Letters (RA-L) and International Conference on Robotics and Automation (ICRA), 2022
arXiv
project
SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar
International Conference on Machine Learning (ICML), 2021
arXiv
project
Procedure Planning in Instructional Videos
Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles
European Conference on Computer Vision (ECCV), 2020
arXiv
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
arXiv
Motion Reasoning for Goal-Based Imitation Learning
De-An Huang, Yu-Wei Chao*, Chris Paxton*, Xinke Deng, Li Fei-Fei, Juan Carlos Niebles, Animesh Garg, Dieter Fox
International Conference on Robotics and Automation (ICRA), 2020
video
Regression Planning Networks
Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei
Neural Information Processing Systems (NeurIPS), 2019
Imitation Learning for Human Pose Prediction
Borui Wang, Ehsan Adeli, Hsu-kuang Chiu, De-An Huang, and Juan Carlos Niebles
IEEE International Conference on Computer Vision (ICCV), 2019
Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning
De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, and Juan Carlos Niebles
International Conference on Intelligent Robots and Systems (IROS), 2019
arXiv
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, and Juan Carlos Niebles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
arXiv
Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
De-An Huang*, Suraj Nair*, Danfei Xu*, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, and Juan Carlos Niebles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019 (Oral)
arXiv
Action-Agnostic Human Pose Forecasting
Hsu-Kuang Chiu, Ehsan Adeli, Borui Wang, De-An Huang, and Juan Carlos Niebles
IEEE Winter Conference on Applications of Computer Vision (WACV), 2019
arXiv
Code
Learning to Decompose and Disentangle Representations for Video Prediction
Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li Fei-Fei, Juan Carlos Niebles
Neural Information Processing Systems (NIPS), 2018
arXiv
Code
Temporal Modular Networks for Retrieving Complex Compositional Activities in Video
Bingbin Liu, Serena Yeung, Edward Chou, De-An Huang, Li Fei-Fei, and Juan Carlos Niebles
European Conference on Computer Vision (ECCV), 2018
Neural Graph Matching Networks for Fewshot 3D Action Recognition
Michelle Guo, Edward Chou, De-An Huang, Shuran Song, Serena Yeung, and Li Fei-Fei
European Conference on Computer Vision (ECCV), 2018
Focus on the Hard Things: Dynamic Task Prioritization for Multitask Learning
Michelle Guo, Albert Haque, De-An Huang, Serena Yeung, and Li Fei-Fei
European Conference on Computer Vision (ECCV), 2018
Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Video
De-An Huang*, Shyamal Buch*, Lucio Dery, Animesh Garg, Li Fei-Fei, and Juan Carlos Niebles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (Oral)
project
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, Lorenzo Torresani, Manohar Paluri, Li Fei-Fei, and Juan Carlos Niebles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight)
Visual Forecasting by Imitating Dynamics in Natural Sequences
Kuo Hao Zeng, William B. Shen, De-An Huang, Min Sun, and Juan Carlos Niebles
IEEE International Conference on Computer Vision (ICCV), 2017 (Spotlight)
Activity Forecasting: An Invitation to Predictive Perception
Kris M. Kitani, De-An Huang, and Wei-Chiu Ma
Book: Group and Crowd Behavior for Computer Vision. Chapter 12, 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
De-An Huang, Joseph J. Lim, Li Fei-Fei, and Juan Carlos Niebles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
arXiv
project
Unsupervised Learning of Long-Term Motion Dynamics for Videos
Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, Li Fei-Fei
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
arXiv
Forecasting Interactive Dynamics of Pedestrians with Fictitious Play
Wei-Chiu Ma, De-An Huang, Namhoon Lee, and Kris M. Kitani
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
arXiv
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
De-An Huang, Li Fei-Fei, and Juan Carlos Niebles
European Conference on Computer Vision (ECCV), 2016
arXiv
project
video
How Do We Use Our Hands? Discovering a Diverse Set of Common Grasps
De-An Huang, Minghuang Ma*, Wei-Chiu Ma*, and Kris M. Kitani
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
supplementary
extended abstract
Approximate MaxEnt Inverse Optimal Control and its Application for Mental Simulation of Human Interactions
De-An Huang, A. M. Farahmand, Kris M. Kitani, and J. Andrew Bagnell
AAAI Conference on Artificial Intelligence (AAAI), 2015
supplementary
Action-Reaction: Forecasting the Dynamics of Human Interaction
De-An Huang and Kris M. Kitani
European Conference on Computer Vision (ECCV), 2014
video
Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition
De-An Huang and Yu-Chiang Frank Wang
IEEE International Conference on Computer Vision (ICCV), 2013
code
With One Look: Robust Face Recognition Using Single Sample Per Person
De-An Huang and Yu-Chiang Frank Wang
ACM Multimedia, short paper, 2013
Self-Learning Based Image Decomposition with Applications to Single Image Denoising
D.-A. Huang, L.-W. Kang, Y.-C. F. Wang, and C.-W. Lin
IEEE Transactions on Multimedia (TMM), volume 16, number 1, pages 1-11, January 2014
Context-Aware Single Image Rain Removal
D.-A. Huang, L.-W. Kang, C.-Y. Tsai, M.-C. Yang, C.-W. Lin, and Y.-C. F. Wang
IEEE International Conference on Multimedia & Expo (ICME), 2012
Self-Learning of Edge-Preserving Single Image Super-Resolution via Contourlet Transform
M.-C. Yang*, D.-A. Huang*, C.-Y. Tsai, and Y.-C. F. Wang
IEEE International Conference on Multimedia & Expo (ICME), 2012
Context-Aware Single Image Super-Resolution Using Locality-Constrained Group Sparse Representation
C.-Y. Tsai, D.-A. Huang, M.-C. Yang, L.-W. Kang, and Y.-C. F. Wang
Visual Communications and Image Processing (VCIP), 2012
Species Minimization in Computation with Biochemical Reactions
R.-Y. Huang, D.-A. Huang, H.-J. K. Chiang, J.-H. R. Jiang, and F. Fages
International Workshop on Bio-Design Automation (IWBDA), 2013
Compiling Program Control Flows into Biochemical Reactions
D.-A. Huang, J.-H. R. Jiang, R.-Y. Huang, and C.-Y. Cheng
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2012