This is the official implementation of the ICCV 2021 paper:
G Berton, C. Masone, V. Paolicelli and B. Caputo, Viewpoint Invariant Dense Matching for Visual Geolocalization
[ICCV OpenAccess] [ArXiv] [Video] [BibTex]
First download the baseline models which have been trained following the training procedure in the NetVLAD paper. We provide a script to download the six models used, which are a combination of 3 backbone encoders (AlexNet, VGG-16 and ResNet-50) with 2 pooling/aggregation layers (GeM and NetVLAD). The models are automatically saved in data/pretrained_baselines.
python download_pretrained_baselines.py
Then you should prepare your geo-localization dataset, so that the directory tree is as such:
dataset_name
└── images
├── train
│ ├── gallery
│ └── queries
├── val
│ ├── gallery
│ └── queries
└── test
├── gallery
└── queries
and the images are named as @UTM east@UTM north@[email protected]
See requirements.txt
You can train the model using the train.py
, here's an example with the lightest/fastest model (i.e. AlexNet + GeM):
python train.py --arch alexnet --pooling gem --resume_fe data/pretrained_baselines/alexnet_gem.pth
For a full set of options, and explanation of the parameters, run python train.py -h
.
The script will create a folder under ./runs/default/YYYY-MM-DD_HH-mm-ss
where logs and checkpoints will be saved. At the end of the training you will see the results with the baseline model, as well as when re-ranking is applied using GeoWarp.
You can use this code to compute the results with our trained models. To reproduce the results from the paper, you can download our models simply running
python download_trained_hom_reg.py
which will automatically download the models and save them under data/trained_homography_regressions. Then to obtain the results you can execute
python eval.py --arch alexnet --pooling gem --resume_fe data/pretrained_baselines/alexnet_gem.pth --resume_hr data/trained_homography_regressions/alexnet_gem.pth
This will give you the exact same results as in Table 1 of the paper.
For a full set of options, and explanation of the parameters, run python eval.py -h
.
You can generate and visualize self-supervised data given a single image, simply running
python visualize_ss_data.py --image_path data/example.jpg --k 0.8
The script generates four images (notation is consistent with the paper):
- ./data/ss_img_source.jpg: the source image I, with the visualization of the two quadrilaterals tx (orange) and ty (purple) and their intersection tz (green) as defined in the paper;
- ./data/ss_proj_a.jpg: the first projection Ia, with the projection ta of the intersection (green);
- ./data/ss_proj_b.jpg: the second projection Ib, with the projection tb of the intersection (green);
- ./data/ss_proj_intersection.jpg: the projection of the intersection.
You can change the value of k to see how this influences the training data.
Example of randomly generated images:
Source image | Projection A | Projection B | Projected intersection |
---|---|---|---|
If you use this code in your project, please cite us using:
@InProceedings{Berton_ICCV_2021,
author = {Berton, Gabriele and Masone, Carlo and Paolicelli, Valerio and Caputo, Barbara},
title = {Viewpoint Invariant Dense Matching for Visual Geolocalization},
booktitle = ICCV,
month = {October},
year = {2021},
pages = {12169-12178}
}