pip install osu-data; osu-data -m mania -v top_1000 -ymd YYYY_MM_DD
Docker must be installed and running on your machine.
Retrieves database data from https://data.ppy.sh/ and hosts it on a local MySQL
server.
Optionally, store all ranked/loved .osu
files in a service with the -f
tag.
IMPORTANT: MySQL data persists across runs. Recreate the MySQL Service if you changed the data used.
-
Install via pip
pip install osu-data
-
Minimally, specify:
-m
,--mode
: The game mode to build the database with.osu
,taiko
,catch
ormania
-v
,--version
: The database version.top_1000
,top_10000
orrandom_10000
-
Optionally, specify:
-ymd
,--year_month_day
: The year, month, day of the database in the formatYYYY_MM_DD
-p
,--port
: The port to expose MySQL on. Default is3308
-f
,--files
: Whether to download.osu
files.-np
,--nginx-port
: The port to expose the nginx service on. Default is8080
. Not used if-f
is not specified.--...
: See below table, these are optional flags to include or exclude more data. By specifying the flag, will INVERT the default value.
Option | Default Value |
---|---|
--beatmap-difficulty-attribs |
False |
--beatmap-difficulty |
False |
--scores |
True |
--beatmap-failtimes |
False |
--user-beatmap-playcount |
False |
--beatmaps |
True |
--beatmapsets |
True |
--user-stats |
True |
--sample-users |
True |
--counts |
True |
--difficulty-attribs |
True |
--beatmap-performance-blacklist |
True |
These options are chosen to be the most useful for analysis, and performance.
E.g.
osu-data \
-m osu -v top_1000 -ymd 2023_08_01 -p 3308 -f \
--beatmap-difficulty
- Download the top 1000 osu! standard beatmaps
- from 1st August 2023
- expose MySQL on port 3308
- download
.osu
files - include beatmap difficulty data
- Connect on:
localhost:<MYSQL_PORT>
localhost:<NGINX_PORT>
(if-f
is specified)
- Docker daemon is not running. Make sure that Docker is installed and running. If you're using Docker Desktop, make sure it's actually started.
- MySQL Data isn't incorrect. A few reasons
- Import was abruptly stopped. This can cause some
.sql
files to be missing / incomplete. Delete the whole compose project and try again. - Didn't specify the optional flags to include files. By default, some
.sql
files are not loaded. Take a look atosu-data -h
and specify the optional flags to include them. - Data is outdated. By default, on every re-run of
osu-data
, the data is preserved. To update the data, you must delete the whole compose project and try again.
- Import was abruptly stopped. This can cause some
- wget: server returned error: HTTP/1.1 404 Not Found. This happens when
you try to pull a
YYYY_MM_DD
that doesn't exist, and happens often when the data isn't yet ready on the start of each month. Check on https://data.ppy.sh/ to see whichYYYY_MM_DD
are available. rm: can't remove '../osu.mysql.init/*'
: This is safe to ignore.- MySQL Credentials. By default, the MySQL doesn't have a password, so just
use
root
as the username and leave the password blank. - No
files
service. This is default,files
service is optional and must be activated with the-f
tag.osu-data -h
for more info.
The database is tuned to be fast in importing speed, thus shouldn't be used for
production. Notably, we set innodb_doublewrite = 0
which can compromise
data integrity in the event of a crash. If you want to use this for production,
we recommend to set this up from this Git repo, and tweak mysql.cnf
.
- Do not distribute the built images as per peppy's request. Instead, you can just share the code to build your image, which should yield the same result.
- This database is meant to be for analysis, it's not tuned for production.
Tweak
mysql.cnf
after importing for more MySQL customizations. - Finally, be mindful on conclusions you make from the data.
- 0.1.5:
- Allowed wider range of Python versions
3.9 ~ 4.0
.
- Allowed wider range of Python versions
- 0.2.0:
- Added GitHub Actions to automatically create dataset on workflow dispatch.
- Year, Month specification is now Year, Month, Day because some data dumps
don't fall exactly on day 1.
-ym
->-ymd
,--year-month
->--year-month-day
- Default of
-ymd
is removed to encourage users to check source of data.