NOTE: We do not claim ownership of most of this repository. Our additions include RDEI and RDPI. Full credits go to the authors of this repository. Most of this README is a direct copy of the aforementioned repository. Our additions include the addition of the list of algorithms that are compared; and the instructions to run all experiments sequentially in the docker via run_all.py
.
This repository contains the code used to generate the results in the paper Comparison of High-Dimensional Bayesian Optimization Algorithms on BBOB.
It proposes a modular framework to make the implementation of several algorithms compared within the paper compatible with IOHprofiler and log their performance.
The compared algorithms are:
- Vanilla Bayesian Optimization, taken from the Python module scikit-optimize.
- CMA-ES from the pycma package.
- Random search, taken from the Python module numpy using the method random.uniform.
- SAASBO algorithm from High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Subspaces.
- RDUCB introduced in Are Random Decompositions all we need in High Dimensional Bayesian Optimisation?.
- RDPI and RDEI are variations on RDUCB, using other acquisition functions: probability of improvement and expected improvement respectively.
- PCA-BO proposed in High Dimensional Bayesian Optimization Assisted by Principal Component Analysis.
- KPCA-BO introduced in High Dimensional Bayesian Optimization with Kernel Principal Component Analysis.
- TuRBO from Scalable Global Optimization via Local Bayesian Optimization.
This code compares these approaches on the 24 functions of the Black-Box Optimization Benchmarking (BBOB) suite from the COCO benchmarking environment using their definition from IOHprofiler. It is based on the original repositories and modules of the selected algorithms vanilla Bayesian Optimization, CMA-ES, random search, SAASBO, RDUCB, PCA-BO, KPCA-BO and TuRBO.
The implementation is in Python 3.10.12 and all the libraries used are listed in requirements.txt
.
run_experiment.py
is the main file, used to run any experiments. It initializes the main setting of the experiment, calls the chosen algorithm, and writes log files. It takes as argument a file.json
that is the output of the filegen_config.py
.wrapper.py
contains the definition of all algorithms and the methodwrapopt
that runs the main loop of the chosen algorithm. It is called by the filerun_experiment.py
.my_logger.py
defines all the functions needed to generate the log files, storing the output data generated in a run. It is called by the filerun_experiment.py
.total_config.json
allows the user to define the settings of an experiment. It is taken as an argument by the filegen_config.py
.gen_config.py
generates a folder calledconfigs
containing files to run experiments based on the settings defined intotal_config.json
.run_all.py
runs all experiments in the first folder it sees in the docker that contains all the configurations of the experiments. It runs this locally and sequentially.mylib
stores the libraries with the implementation of the compared algorithms.bayes_optim.zip
contains the bayes-optim package, with slight modifications to track the CPU time for CMA-ES.Bayesian-Optimization.zip
contains the cloned repository Bayesian-Optimization with some changes to track the CPU time for the algorithms PCA-BO and KPCA-BO.RDUCB.zip
contains the cloned repository RDUCB with modifications to track the CPU time for the algorithm RDUCB.RDEI.zip
contains the modified RDUCB function: Random Decomposition with Expected Improvement Acquisition FunctionRDPI.zip
contains the modified RDUCB function: Random Decomposition with Probability of Improvement Acquisition FunctionGPy.zip
andGPyOpt.zip
contain the modules Gpy and GpyOpt, respectively, with modifications to track the CPU time for the algorithm RDUCB.skopt.zip
contains the module skopt with some changes to track the CPU time for the algorithm vanilla Bayesian Optimization.requirements.txt
contains the list of all the project’s dependencies with the specific version of each dependency.
Running this code from source requires Python 3.10.12, and the libraries given in requirements.txt
(Warning: preferably use a virtual environment for this specific project, to avoid breaking the dependencies of your other projects). In Ubuntu, installing the dependencies can be done using the following command:
pip install -r requirements.txt
To correctly track the CPU time, this code needs some modified modules and a modified cloned repositories. Follow the steps below:
- Unzip the folders
bayes_optim.zip
,skopt.zip
,Bayesian-Optimization.zip
,GPy.zip
,GPyOpt.zip
,RDUCB.zip
,RDPI.zip
, andRDEI.zip
:
unzip bayes_optim.zip
unzip skopt.zip
unzip Bayesian-Optimization.zip
unzip GPy.zip
unzip GPyOpt.zip
unzip RDUCB.zip
unzip RDPI.zip
unzip RDEI.zip
- Find the path of the used Python site-packages directory:
python -m site
- Move
bayes_optim
,skopt
,GPy
andGPyOPt
to the used Python site-packages directory:
mv bayes_optim <found_path_site_packages>
mv skopt <found_path_site_packages>
mv GPy <found_path_site_packages>
mv GPyOPt <found_path_site_packages>
- Move
Bayesian-Optimization
andRDUCB
,RDEI
andRDPI
to the right libraries inside the project:
mv Bayesian-Optimization mylib/lib_BO_bayesoptim
mv RDUCB mylib/lib_RDUCB/HEBO
mv RDPI mylib/lib_RDPI/HEBO
mv RDEI mylib/lib_RDEI/HEBO
First of all, use the file total_config.json
to decide the settings of your experiment:
folder
is the prefix of the folders that are generated to store the results from the experiment.optimizers
is a list of as many strings as the number of algorithms that will be tested in the experiment. Possible names for the algorithms areBO_sklearn
,pyCMA
,random
,saasbo
,RDUCB
,RDEI
,RDPI
,linearPCABO
,KPCABO
,turbo1
andturbom
.fiids
is a list of functions to be optimized. The list can contain a single number or multiple integers identifying the 24 BBOB functions.iids
is a list of problem instances. In our paper, 0, 1, and 2 are considered.dims
is a list of problem dimensions tested within the experiment.reps
is the number of run repetitions that will be performed under the same settings (optimizer, function id, instance id, etc.). Different repetitions differ only for different seeds. The seed number for repetitions starts from 0.lb
andub
are the lower and the upper bounds of the search space along each dimension. In the paper, they are fixed as -5 and 5, respectively.extra
contains extra text information to store in the result folder.
Results will be generated inside a run_[current_date_and_time]
folder. This contains a configs
subfolder storing as many .json files as the total number of different settings defined by all the combinations of parameters in total_config.json
. The name of each .json file describes the specific setting: optimizer (Opt), function (F), instance (Id), dimension (Dim), repetition (Rep), and a numerical experiment identifier utilized to denote the tested settings in ascending order (NumExp). For example, Opt-turbo1_F-1_Id-0_Dim-10_Rep-0_NumExp-0.json
.
If a job scheduling system for Linux clusters is available, the batch script to be edited is inside gen_config.py
. Choosing the parameters in this file and script editing must be done before launching any jobs.
Use the command
python gen_config.py total_config.json
to generate a folder run_[current_date_and_time]
containing the configs
subfolder described above. Moreover, the same command generates an output to screen. Copy-paste the printed line (for example, cd [path-to-root-folder]/run_15-04-2024_16h14m59s && for (( i=0; i<1; ++i )); do sbatch slurm$i.sh; done
) in your terminal window to start as many jobs as the setting combinations specified in total_config.json
. At this point, the folders containing the results are generated inside run_[current_date_and_time]
.
If a job scheduling system is not available or not necessary, the following steps must be followed. In this case, there is no need to adjust the settings to generate the batch script by editing the file gen_config.py
. Again, after defining the experimental setup in total_config.json
, run the command
python gen_config.py total_config.json
to generate the folder run_[current_date_and_time]
containing the configs
subfolder described above. The command also prints to screen the experiment root folder run_[current_date_and_time]
and how many files were generated, i.e., how many different settings are considered.
A single run for a specific setting can be started using the following command:
python run_experiment.py run_[current_date_and_time]/configs/[setting_you_want_to_run].json
and the folder containing the results is generated inside run_[current_date_and_time]
.
To simplify the process of setting up and running the application, you can use Docker. The provided Dockerfile will create a consistent environment, ensuring that the application runs smoothly regardless of the host system. Follow the steps below:
- Install Docker on the system.
- Build the Docker Image and run the Docker Container executing the following commands:
docker build --network=host . -t hdbo-docker
docker run -it hdbo-docker bash
- After this, perform the normal commands to generate the configuration file:
python gen_config.py total_config.json
- To run a single run, perform the normal command:
python run_experiment.py run_[current_date_and_time]/configs/[setting_you_want_to_run].json
or to run all the configurations, use
python run_all.py
Each of the folders generated inside run_[current_date_and_time]
contains a subfolder data_number_and_name_of_the_function
that stores a .dat
file generated by the logger. It tracks the loss evolution (under the name best-so-far f(x)
) and the CPU times for a specific run. These are the metrics used in our paper to compare the performance of different algorithms.
To get the plots seen in the paper, run
python visualize_results.py
with a ./results/
folder containing each result of the experiment in the form of a file structure, containing a /dat
file