-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Optuna GSoC 2024
Optuna is an open-source hyperparameter optimization framework to automate hyperparameter optimization. The major features of Optuna are:
- define-by-run interface for defining search spaces,
- state-of-the-art algorithms to efficiently search large spaces and prune unpromising trials for faster results, and
- easy parallelization for hyperparameter searches over multiple threads or processes without modifying code.
Optuna is participating in GSoC 2024 as a member of NumFOCUS.
For coding on Optuna, a solid basis in Python coding will be required. Experience working with Git and github.com will also be useful. To write a proposal, it is probably necessary to familiarize yourself with Optuna, and optuna-example is a good start for this.
Tip
Although we will evaluate applications based on your proposal, we also need to check the project feasibility based on your skills. For this reason, if you can provide any information regarding your development skills such as contributions to open-source projects, it is very helpful to support the project feasibility.
Note
To contact us, please email us at [email protected].
- Optuna: The Next Generation of Hyperparameter Optimization Software, or
- Kurobako: Maintainable Performance Benchmark Tooling for Optimization Algorithms
Feel free to come up with a project that does not fall into any of the ones proposed above. We are keen to hear your ideas on how the Optuna ecosystem can be improved.
Project Details
Optuna is widely accepted in the community as a hyperparameter optimization tool. In recent years, it has been used not only for hyperparameter optimization but also for various black-box optimizations. We, the developers, want to apply Optuna to various new applications such as LLM and material exploration. In addition, there are countless features we want to achieve, such as supporting new algorithms and optimization paradigms, improving the usability of existing features, and refactoring. Why don't you join in the development of Optuna and challenge the forefront of the most advanced black box optimization tools?
You will take part in reaching the above goal. The project may extend the applicability of Optuna. You will be dealing with one of the tasks that are not yet completed. Here are some important tasks.
- Develop integrations for large language model (LLM) fine-tuning, and
- Develop integrations for active learning
You can take a listed task or you can also propose a task based on your interest if you have any thoughts.
You will work with the Optuna committers and contributors to develop key items of Optuna. The actual work requires advanced coding skills and communication skills to facilitate discussions. Also, each development item requires the corresponding domain knowledge, i.e., black-box optimization, and LLM or algorithms for active learning. It is a great opportunity to acquire such knowledge through development although it is advantageous to have such knowledge beforehand.
We expect you to write:
- specific usecases or problem setups, which you would like to tackle,
- what libraries or algorithms we need to solve them,
- challenges, which we may face, or missing features of Optuna when we tackle them,
- the solutions to the challenges or the features to solve the issues, and
- the interface and the internal design of the new features.
For this project, you have two options:
- to create (a) wrapper(s) of third-party libraries for Optuna to solve the problems of interest, or
- to directly implement (an) algorithm(s) in Optuna so that users can solve the problems.
For the first option, you can check many examples in optuna-integration. For example, BoTorchSampler provides an option to optimize various problem setups using a Gaussian process-based sampler and LightGBMTuner enables users to yield optimized hyperparameters for LightGBM without letting users explicitly code its hyperparameter optimization. Note that the reason why we rely on the third-party libraries above is to reduce our maintenance cost by avoiding the re-implementation of the existing modules. If you come up with any ideas on how users can benefit from a new integration of a third library in combination with Optuna, please write a proposal based on your idea.
The second option is, in principle, the same as the first option except you need to implement (an) algorithm(s) from scratch. This option is more challenging and you need to well-plan the timeline of your project. As mentioned above, it is not a bad idea to rely on third-party libraries if possible. However, if there are any potential issues, we may suffer, we may need to directly implement (an) algorithm(s) in Optuna. In this case, we would like you to write what exactly the potential issues are and what designs can avoid the issues.
In summary, our question is what features you think we need to use LLM better in combination with Optuna, or how users can benefit from active learning in the Optuna interface. Plus, how should we implement such a feature? We expect you to write them in your proposal.
- Software development experience using Python,
- High work morale, and
- communication skills
- Software development experience in teams,
- Knowledge/experience of LLM or active learning,
- Knowledge of RDB (MySQL, PostgreSQL, SQLite), error handling and unit-testing, and
- Knowledge/experience with Bayesian optimization, hyperparameter optimization and black-box optimization
@HideakiImamura, @not522
175h
High
Project Details
Optuna supports a wide range of black-box optimization algorithms and we need to maintain the performance of these algorithms while integrating new functionalities in our daily development. Yet, it is often challenging to monitor their performance only with unit tests. For instance, altering the default values of a constructor argument or some internal logic might degrade optimization performance without clearly exhibiting a bug. Even simple refactoring could potentially lead to such performance degradation. Hence, continuous benchmarking is essential to avoid performance degradation.
Optuna uses a Rust-based tool named Kurobako originally developed by one of Optuna's core maintainers for benchmarking purposes. Kurobako offers various benchmark problems for black-box optimization and provides a command-line interface to test black-box optimization algorithms on the problems. The evaluation of Optuna’s algorithms using Kurobako allows us to continually keep track of their performance, which can assist in determining whether a change should be modified.
Although Kurobako has already been used in our team for a long time, it still has some problems. The most prominent issue relates to the communication between Python and Rust, which is designed to send and receive JSON format data via standard error and standard output invoked by subprocess. This design hinders developers from simply using debug logs. Another challenge stems from Kurobako being written in Rust, creating a barrier for the majority of Optuna community developers and making it hard for them to contribute to Kurobako’s development. To overcome these issues, you will work on the migration of Kurobako from Rust to Python or the development of Python bindings of Kurobako so that we can leverage the existing codebase.
After the project, the maintenance of Kurobako by the Optuna community and its usability with debug logs will be significantly improved. We recommend initiating the project by either rewriting Kurobako in Python or entirely redeveloping it. In addition, the project scope may be extended to incorporate new functionalities or to enhance the user experience. Tasks may also involve integration with GitHub Actions and Optuna’s CI. Any suggestions from your side are also welcome.
You get to work with multiple programming languages and contribute to enhancing the overall experience of numerous Optuna developers. Development operations are critical for the efficient management of large open-source projects and you will gain this skill working with our team. The experience and the skills gained during the project can be invaluable for many other projects in your engineering career. Moreover, you will establish strong connections with engineers in one of Japan's leading startups.
We expect you to write:
- the reasons why you think it is better to re-design Kurobako for Python from scratch or inherit the Rust design,
- how the new design looks like for your choice, and
- the timeline of your project.
Basically, your options are limited to either two of them above, so we would like to know why you think which is better and how you would like to tackle this project. Even if we take over the Rust design, we would like to know your design idea because we need to somehow modify the Rust design. The design means to be class objects we need and what methods we need for each of them. As migrating to Python implementation can slow down Kurobako significantly, it is also advisable to think of how to suppress the slowdown.
- Software development experience using Python
- High work morale and communication skills
- Software development experience in teams
- Knowledge/experience with Bayesian optimization, hyperparameter optimization and black-box optimization, and benchmarking
- Skill to read and understand Rust code
@nabenabe0928, @y0z
175h
Medium