Skip to content

[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"

License

Notifications You must be signed in to change notification settings

IDEA-Research/X-Pose

Repository files navigation

🤩 News

  • 2024.07.12: X-Pose supports controllable animal face animation. See details here.

  • 2024.07.02: X-Pose is accepted to ECCV24 (We changed the model name from UniPose to X-Pose to avoid confusion with similarly named previous works).

  • 2024.02.14: We update a file to highlight all classes (1237 classes) in the UNIKPT dataset.

  • 2023.11.28: We are excited to highlight the 68 face keypoints detection ability of X-Pose across any categories in this figure. The definition of face keypoints follows this dataset.

  • 2023.11.9: Thanks to OpenXLab, you can try a quick online demo. Looking forward to the feedback!

  • 2023.11.1: We release the inference code, demo, checkpoints, and the annotation of the UniKPT dataset.

  • 2023.10.13: We release the arxiv version.

In-the-wild Test via X-Pose

X-Pose has strong fine-grained localization and generalization abilities across image styles, categories, and poses.


Detecting any Face Keypoints:


🗒 TODO

  • Release inference code and demo.
  • Release checkpoints.
  • Release UniKPT annotations.
  • Release training codes.

💡 Overview

• X-Pose is the first end-to-end prompt-based keypoint detection framework.


• It supports multi-modality prompts, including textual and visual prompts to detect arbitrary keypoints (e.g., from articulated, rigid, and soft objects).

Visual Prompts as Inputs:


Textual Prompts as Inputs:


🔨 Environment Setup

  1. Clone this repo
git clone https://github.com/IDEA-Rensearch/X-Pose.git
cd X-Pose
  1. Install the needed packages
pip install -r requirements.txt
  1. Compiling CUDA operators
cd models/UniPose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

▶ Demo

1. Guidelines

• We have released the textual prompt-based branch for inference. As the visual prompt involves a substantial amount of user input, we are currently exploring more user-friendly platforms to support this functionality.

• Since X-Pose has learned strong structural prior, it's best to use the predefined skeleton as the keypoint textual prompts, which are shown in predefined_keypoints.py.

• If users don't provide a keypoint prompt, we'll try to match the appropriate skeleton based on the user's instance category. If unsuccessful, we'll default to using the animal's skeleton, which covers a wider range of categories and testing requirements.

2. Run

Replace {GPU ID}, image_you_want_to_test.jpg, and "dir you want to save the output" with appropriate values in the following command

CUDA_VISIBLE_DEVICES={GPU ID} python inference_on_a_image.py \
-c config/UniPose_SwinT.py \
-p weights/unipose_swint.pth \
-i image_you_want_to_test.jpg \
-o "dir you want to save the output" \
-t "instance categories" \ (e.g., "person", "face", "left hand", "horse", "car", "skirt", "table")
-k "keypoint_skeleton_text" (If necessary, please select an option from the 'predefined_keypoints.py' file.)

We also support the inference using gradio.

python app.py

Checkpoints

name backbone Keypoint AP on COCO Checkpoint Config
1 X-Pose Swin-T 74.4 Google Drive / OpenXLab GitHub Link
2 X-Pose Swin-L 76.8 Coming Soon Coming Soon

The UniKPT Dataset


Datasets KPT Class Images Instances Unify Images Unify Instance
COCO 17 1 58,945 156,165 58,945 156,165
300W-Face 68 1 3,837 4,437 3,837 4,437
OneHand10K 21 1 11,703 11,289 2,000 2000
Human-Art 17 1 50,000 123,131 50,000 123,131
AP-10K 17 54 10,015 13,028 10,015 13,028
APT-36K 17 30 36,000 53,006 36,000 53,006
MacaquePose 17 1 13,083 16,393 2,000 2,320
Animal Kingdom 23 850 33,099 33,099 33,099 33,099
AnimalWeb 9 332 22,451 21,921 22,451 21,921
Vinegar Fly 31 1 1,500 1,500 1,500 1,500
Desert Locust 34 1 700 700 700 700
Keypoint-5 55/31 5 8,649 8,649 2,000 2,000
MP-100 561/293 100 16,943 18,000 16,943 18,000
UniKPT 338 1237 - - 226,547 418,487

• UniKPT is a unified dataset from 13 existing datasets, which is only for non-commercial research purposes.

• All images included in the UniKPT dataset originate from the datasets listed in the table above. To access these images, please download them from the original repository.

• We provide the annotations with precise keypoints' textual descriptions for effective training. More conveniently, you can find the text annotations in the link.

Citing X-Pose

If you find this repository useful for your work, please consider citing it as follows:

@article{xpose,
  title={X-Pose: Detection Any Keypoints},
  author={Yang, Jie and Zeng, Ailing and Zhang, Ruimao and Zhang, Lei},
  journal={ECCV},
  year={2024}
}
@inproceedings{yang2023neural,
  title={Neural Interactive Keypoint Detection},
  author={Yang, Jie and Zeng, Ailing and Li, Feng and Liu, Shilong and Zhang, Ruimao and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={15122--15132},
  year={2023}
}
@inproceedings{yang2022explicit,
  title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
  author={Yang, Jie and Zeng, Ailing and Liu, Shilong and Li, Feng and Zhang, Ruimao and Zhang, Lei},
  booktitle={The Eleventh International Conference on Learning Representations},
  year={2022}
}

About

[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published