Skip to content
@OpenGVLab

OpenGVLab

General Vision Team of Shanghai AI Laboratory

Static Badge Twitter

Welcome to OpenGVLab! 👋

We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.

We develop model architecture and release pre-trained foundation models to the community to motivate further research in this area. We have made promising progress in general vision AI, with 109 SOTA🚀. In 2022, our open-sourced foundation model 65.5 mAP on the COCO object detection benchmark, 91.1% Top1 accuracy in Kinetics 400, achieved landmarks for AI vision👀 tasks for image🖼️ and video📹 understanding. In 2023, we created VideoChat🦜,llama-adapter🦙, 3D foundation model Ponder V2🧊 and many more wonderful works! In CVPR 2023, our vision foundation model InternImage was listed as one of the most influential papers, and by benefiting from our partner OpenDriveLab, we won the Best paper together🎉 .

In 2024, we released the best open-source VLM InternVL , video understanding foundation model InternVideo2, which won 7 Champions on EgoVis challenges 🥇. Up to now, our brilliant team have open-sourced more than 70 works, please find them here😃

Based on solid vision foundations, we have expanded to Multi-Modality models and. We aim to empower individuals and businesses by offering a higher starting point for developing vision-based AI products and lessening the burden of building an AI model from scratch.

Branches: Alpha (explore lattest advances in vision+language research), uni-medical (focus on medical AI), Vchitect (Generative AI)

Follow us:    Twitter X logo Twitter   🤗Hugging Face    Medium logo Medium    WeChat logo WeChat    zhihu logo Zhihu

Pinned Loading

  1. InternVL InternVL Public

    [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

    Python 6.1k 478

  2. InternVideo InternVideo Public

    [ECCV2024] Video Foundation Models & Data for Multimodal Understanding

    Python 1.4k 88

  3. Ask-Anything Ask-Anything Public

    [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

    Python 3.1k 252

  4. VideoMamba VideoMamba Public

    [ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

    Python 846 61

  5. OmniQuant OmniQuant Public

    [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

    Python 732 56

  6. LLaMA-Adapter LLaMA-Adapter Public

    [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

    Python 5.8k 375

Repositories

Showing 10 of 68 repositories
  • Ask-Anything Public

    [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

    OpenGVLab/Ask-Anything’s past year of commit activity
    Python 3,082 MIT 252 60 5 Updated Nov 26, 2024
  • InternVL Public

    [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

    OpenGVLab/InternVL’s past year of commit activity
    Python 6,125 MIT 478 143 4 Updated Nov 25, 2024
  • MM-NIAH Public

    [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

    OpenGVLab/MM-NIAH’s past year of commit activity
    Python 102 6 1 0 Updated Nov 25, 2024
  • PIIP Public

    [NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)

    OpenGVLab/PIIP’s past year of commit activity
    Python 59 MIT 2 0 0 Updated Nov 21, 2024
  • InternVideo Public

    [ECCV2024] Video Foundation Models & Data for Multimodal Understanding

    OpenGVLab/InternVideo’s past year of commit activity
    Python 1,429 Apache-2.0 88 91 4 Updated Nov 17, 2024
  • OmniCorpus Public

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    OpenGVLab/OmniCorpus’s past year of commit activity
    Python 274 6 0 0 Updated Nov 17, 2024
  • GUI-Odyssey Public

    GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.

    OpenGVLab/GUI-Odyssey’s past year of commit activity
    Python 71 3 1 0 Updated Nov 12, 2024
  • Vision-RWKV Public

    Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

    OpenGVLab/Vision-RWKV’s past year of commit activity
    Python 373 Apache-2.0 15 14 0 Updated Oct 31, 2024
  • .github Public
    OpenGVLab/.github’s past year of commit activity
    0 1 0 0 Updated Oct 30, 2024
  • OV-OAD Public Forked from ZQSIAT/OV-OAD

    This repo takes the initial step towards leveraging text learning for online action detection without explicit human supervision.

    OpenGVLab/OV-OAD’s past year of commit activity
    1 1 0 0 Updated Oct 28, 2024

Most used topics

Loading…