Comparison of High-Dimensional Bayesian Optimization Algorithms on BBOB

NOTE: We do not claim ownership of most of this repository. Our additions include RDEI and RDPI. Full credits go to the authors of this repository. Most of this README is a direct copy of the aforementioned repository. Our additions include the addition of the list of algorithms that are compared; and the instructions to run all experiments sequentially in the docker via run_all.py.

This repository contains the code used to generate the results in the paper Comparison of High-Dimensional Bayesian Optimization Algorithms on BBOB.

It proposes a modular framework to make the implementation of several algorithms compared within the paper compatible with IOHprofiler and log their performance.

The compared algorithms are:

Vanilla Bayesian Optimization, taken from the Python module scikit-optimize.
CMA-ES from the pycma package.
Random search, taken from the Python module numpy using the method random.uniform.
SAASBO algorithm from High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Subspaces.
RDUCB introduced in Are Random Decompositions all we need in High Dimensional Bayesian Optimisation?.
RDPI and RDEI are variations on RDUCB, using other acquisition functions: probability of improvement and expected improvement respectively.
PCA-BO proposed in High Dimensional Bayesian Optimization Assisted by Principal Component Analysis.
KPCA-BO introduced in High Dimensional Bayesian Optimization with Kernel Principal Component Analysis.
TuRBO from Scalable Global Optimization via Local Bayesian Optimization.

This code compares these approaches on the 24 functions of the Black-Box Optimization Benchmarking (BBOB) suite from the COCO benchmarking environment using their definition from IOHprofiler. It is based on the original repositories and modules of the selected algorithms vanilla Bayesian Optimization, CMA-ES, random search, SAASBO, RDUCB, PCA-BO, KPCA-BO and TuRBO.

Libraries and dependencies

The implementation is in Python 3.10.12 and all the libraries used are listed in requirements.txt.

Structure

run_experiment.py is the main file, used to run any experiments. It initializes the main setting of the experiment, calls the chosen algorithm, and writes log files. It takes as argument a file .json that is the output of the file gen_config.py.
wrapper.py contains the definition of all algorithms and the method wrapopt that runs the main loop of the chosen algorithm. It is called by the file run_experiment.py.
my_logger.py defines all the functions needed to generate the log files, storing the output data generated in a run. It is called by the file run_experiment.py.
total_config.json allows the user to define the settings of an experiment. It is taken as an argument by the file gen_config.py.
gen_config.py generates a folder called configs containing files to run experiments based on the settings defined in total_config.json.
run_all.py runs all experiments in the first folder it sees in the docker that contains all the configurations of the experiments. It runs this locally and sequentially.
mylib stores the libraries with the implementation of the compared algorithms.
bayes_optim.zip contains the bayes-optim package, with slight modifications to track the CPU time for CMA-ES.
Bayesian-Optimization.zip contains the cloned repository Bayesian-Optimization with some changes to track the CPU time for the algorithms PCA-BO and KPCA-BO.
RDUCB.zip contains the cloned repository RDUCB with modifications to track the CPU time for the algorithm RDUCB.
RDEI.zip contains the modified RDUCB function: Random Decomposition with Expected Improvement Acquisition Function
RDPI.zip contains the modified RDUCB function: Random Decomposition with Probability of Improvement Acquisition Function
GPy.zip and GPyOpt.zip contain the modules Gpy and GpyOpt, respectively, with modifications to track the CPU time for the algorithm RDUCB.
skopt.zip contains the module skopt with some changes to track the CPU time for the algorithm vanilla Bayesian Optimization.
requirements.txt contains the list of all the project’s dependencies with the specific version of each dependency.

Execution from source

Dependencies to run from source

Running this code from source requires Python 3.10.12, and the libraries given in requirements.txt (Warning: preferably use a virtual environment for this specific project, to avoid breaking the dependencies of your other projects). In Ubuntu, installing the dependencies can be done using the following command:

pip install -r requirements.txt

Specific modules to copy for tracking the CPU time in the log file

To correctly track the CPU time, this code needs some modified modules and a modified cloned repositories. Follow the steps below:

Unzip the folders bayes_optim.zip, skopt.zip, Bayesian-Optimization.zip, GPy.zip, GPyOpt.zip, RDUCB.zip, RDPI.zip, and RDEI.zip:

unzip bayes_optim.zip
unzip skopt.zip
unzip Bayesian-Optimization.zip
unzip GPy.zip
unzip GPyOpt.zip
unzip RDUCB.zip
unzip RDPI.zip
unzip RDEI.zip

Find the path of the used Python site-packages directory:

python -m site

Move bayes_optim, skopt, GPy and GPyOPt to the used Python site-packages directory:

mv bayes_optim <found_path_site_packages>
mv skopt <found_path_site_packages>
mv GPy <found_path_site_packages>
mv GPyOPt <found_path_site_packages>

Move Bayesian-Optimization and RDUCB, RDEI and RDPI to the right libraries inside the project:

mv Bayesian-Optimization mylib/lib_BO_bayesoptim
mv RDUCB mylib/lib_RDUCB/HEBO
mv RDPI mylib/lib_RDPI/HEBO
mv RDEI mylib/lib_RDEI/HEBO

Run from source

First of all, use the file total_config.json to decide the settings of your experiment:

folder is the prefix of the folders that are generated to store the results from the experiment.
optimizers is a list of as many strings as the number of algorithms that will be tested in the experiment. Possible names for the algorithms are BO_sklearn, pyCMA, random, saasbo, RDUCB, RDEI, RDPI, linearPCABO, KPCABO, turbo1 and turbom.
fiids is a list of functions to be optimized. The list can contain a single number or multiple integers identifying the 24 BBOB functions.
iids is a list of problem instances. In our paper, 0, 1, and 2 are considered.
dims is a list of problem dimensions tested within the experiment.
reps is the number of run repetitions that will be performed under the same settings (optimizer, function id, instance id, etc.). Different repetitions differ only for different seeds. The seed number for repetitions starts from 0.
lb and ub are the lower and the upper bounds of the search space along each dimension. In the paper, they are fixed as -5 and 5, respectively.
extra contains extra text information to store in the result folder.

Results will be generated inside a run_[current_date_and_time] folder. This contains a configs subfolder storing as many .json files as the total number of different settings defined by all the combinations of parameters in total_config.json. The name of each .json file describes the specific setting: optimizer (Opt), function (F), instance (Id), dimension (Dim), repetition (Rep), and a numerical experiment identifier utilized to denote the tested settings in ascending order (NumExp). For example, Opt-turbo1_F-1_Id-0_Dim-10_Rep-0_NumExp-0.json.

Execute parallel jobs using a cluster

If a job scheduling system for Linux clusters is available, the batch script to be edited is inside gen_config.py. Choosing the parameters in this file and script editing must be done before launching any jobs.

Use the command

python gen_config.py total_config.json

to generate a folder run_[current_date_and_time] containing the configs subfolder described above. Moreover, the same command generates an output to screen. Copy-paste the printed line (for example, cd [path-to-root-folder]/run_15-04-2024_16h14m59s && for (( i=0; i<1; ++i )); do sbatch slurm$i.sh; done) in your terminal window to start as many jobs as the setting combinations specified in total_config.json. At this point, the folders containing the results are generated inside run_[current_date_and_time].

Execute single runs

If a job scheduling system is not available or not necessary, the following steps must be followed. In this case, there is no need to adjust the settings to generate the batch script by editing the file gen_config.py. Again, after defining the experimental setup in total_config.json, run the command

python gen_config.py total_config.json

to generate the folder run_[current_date_and_time] containing the configs subfolder described above. The command also prints to screen the experiment root folder run_[current_date_and_time] and how many files were generated, i.e., how many different settings are considered. A single run for a specific setting can be started using the following command:

python run_experiment.py run_[current_date_and_time]/configs/[setting_you_want_to_run].json

and the folder containing the results is generated inside run_[current_date_and_time].

Running the code with Docker

To simplify the process of setting up and running the application, you can use Docker. The provided Dockerfile will create a consistent environment, ensuring that the application runs smoothly regardless of the host system. Follow the steps below:

Install Docker on the system.
Build the Docker Image and run the Docker Container executing the following commands:

docker build --network=host . -t hdbo-docker
docker run -it hdbo-docker bash

After this, perform the normal commands to generate the configuration file:

python gen_config.py total_config.json

To run a single run, perform the normal command:

python run_experiment.py run_[current_date_and_time]/configs/[setting_you_want_to_run].json

or to run all the configurations, use

python run_all.py

Analysis from source

Each of the folders generated inside run_[current_date_and_time] contains a subfolder data_number_and_name_of_the_function that stores a .dat file generated by the logger. It tracks the loss evolution (under the name best-so-far f(x)) and the CPU times for a specific run. These are the metrics used in our paper to compare the performance of different algorithms.

To get the plots seen in the paper, run

python visualize_results.py

with a ./results/ folder containing each result of the experiment in the form of a file structure, containing a /dat file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison of High-Dimensional Bayesian Optimization Algorithms on BBOB

Libraries and dependencies

Structure

Execution from source

Dependencies to run from source

Specific modules to copy for tracking the CPU time in the log file

Run from source

Execute parallel jobs using a cluster

Execute single runs

Running the code with Docker

Analysis from source

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
mylib		mylib
results		results
.gitignore		.gitignore
BO_assignment.pdf		BO_assignment.pdf
Bayesian-Optimization.zip		Bayesian-Optimization.zip
Dockerfile		Dockerfile
GPy.zip		GPy.zip
GPyOpt.zip		GPyOpt.zip
RDEI.zip		RDEI.zip
RDPI.zip		RDPI.zip
RDUCB.zip		RDUCB.zip
README.md		README.md
__init__.py		__init__.py
bayes_optim.zip		bayes_optim.zip
gen_config.py		gen_config.py
hebo.zip		hebo.zip
my_logger.py		my_logger.py
requirements.txt		requirements.txt
run_all.py		run_all.py
run_experiment.py		run_experiment.py
skopt.zip		skopt.zip
sksparse.zip		sksparse.zip
total_config.json		total_config.json
visualize_results.py		visualize_results.py
wrapper.py		wrapper.py

Lexpj/BO-FA

Folders and files

Latest commit

History

Repository files navigation

Comparison of High-Dimensional Bayesian Optimization Algorithms on BBOB

Libraries and dependencies

Structure

Execution from source

Dependencies to run from source

Specific modules to copy for tracking the CPU time in the log file

Run from source

Execute parallel jobs using a cluster

Execute single runs

Running the code with Docker

Analysis from source

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages