The description of those data sets used in the paper are listed in the table below:
Dataset (Links) | #Instances | #classes | #features | feature type |
---|---|---|---|---|
adult | 32561 | 2 | 14 | mixed |
bank-marketing | 45211 | 2 | 16 | mixed |
banknote | 1372 | 2 | 4 | continuous |
chess | 28056 | 18 | 6 | discrete |
connect-4 | 67557 | 3 | 42 | discrete |
letRecog | 20000 | 26 | 16 | continuous |
magic04 | 19020 | 2 | 10 | continuous |
tic-tac-toe | 958 | 2 | 9 | discrete |
wine | 178 | 3 | 13 | continuous |
activity | 10299 | 6 | 561 | continuous |
dota2 | 102944 | 2 | 116 | discrete |
22470 | 4 | 4714 | discrete | |
fashion | 70000 | 10 | 784 | continuous |
For more information about these data sets, you can click the corresponding links.
Each data set corresponds to two files, *.data
and *.info
. The *.data
file stores the data for each instance. The *.info
file stores the information for each feature.
One row in *.data
corresponds to one instance and one column corresponds to one feature (including the class label).
For example, the adult.data
:
37 | Private | 284582 | Masters | 14 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0 | 0 | 40 | United-States | <=50K |
49 | Private | 160187 | 9th | 5 | Married-spouse-absent | Other-service | Not-in-family | Black | Female | 0 | 0 | 16 | Jamaica | <=50K |
52 | Self-emp-not-inc | 209642 | HS-grad | 9 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 45 | United-States | >50K |
31 | Private | 45781 | Masters | 14 | Never-married | Prof-specialty | Not-in-family | White | Female | 14084 | 0 | 50 | United-States | >50K |
...... |
One row (except the last row) in *.info
corresponds to one feature (including the class label). The order of these features must be the same as the columns in *.data
. The first column is the feature name, and the second column indicates the characteristics of the feature, i.e., continuous or discrete. The last row does not correspond to one feature. It specifies the position of the class label column.
For example, the adult.info
:
age | continuous |
workclass | discrete |
fnlwgt | continuous |
education | discrete |
education-num | continuous |
...... | |
hours-per-week | continuous |
native-country | discrete |
class | discrete |
LABEL_POS | -1 |
You can run the demo on your own data sets by putting corresponding *.data
and *.info
file into the dataset
folder. The formats of these two files are described above.