Author: Wooheon Hong, Minsoo Kim with Samsung Electronics
Date: 2021.05.10 ~ 2022. 04. 29
This is the pytorch implementation of the PUMAD but, the detailed code cannot be disclosed.
PUMAD: PU Metric learning for anomaly detection(Information Sciences, 2020)
Distance Hashing-based Filtering
- Find reliable normal and potential anomaly in unlabeled data
Deep Metric Learning
- push negative samples and pull positive samples for each anchor(normal, anomaly)
- python 3.8
- see requirements.txt
conda create –n env_name python=3.8
conda activate env_name
pip install --upgrade pip
pip install -r requirements.txt
X: input data [ndarray] *shape (data, # of features, window length)
y: input labels [ndarray] *shape : (data)
Measure the AUC at each epochs
python train.py --device <gpu number> --dataset <name of .npy file> --data_path <location of files> --window <length of sliding window>
The detailed descriptions about the parameters are as following: Note : There is
Parameter name | Description of parameter |
---|---|
device | device that the code works on, 'cpu' or 'cuda:#' |
data_name | file name of input .npy |
data_path | train data and test data path |
window | length of sliding window, default 10 |
positive_ratio | labeled anomalies ratio, default 1 |
n_bits | # of bits of each hash tables, default 5 |
n_hash_tables | # of hash tables, default 5 |
model | embedding model, default "wavenet" |
n_epochs | epoch size during training, default 10 |
n_batch | batch size, default 512 |
lr | learning rate, default 1e-4 |
hidden_size | hyper parameter of hidden layers channel, default 128 |
margin | margin of triplet loss, default 2 |
Evaluation metrics are precision, recall and F1-score.
python test.py save/[CHECKPOINT].pth
python heatmap.py --savedir save --dataset unsw-nb15 --window_size 10 --plotting
Run test.py
first, then run heatmap.py
.
The parameters of the test.py
are same as train.py
, except the checkpoint path.
The parameters of the heatmap.py
are:
heatmap.py [-h] [--save_dir SAVE_DIR] [--dataset DATASET] [--window_size WINDOW_SIZE] [--_class {,_normal,_anomaly}] [--plotting]
\
├── heatmap.py
├── model.py
├── puamd_structure.PNG
├── README.md
├── requirements.txt
├── save
├── test.py
├── train.py
└── utils
├── utils_dataset.py
├── utils_dhf.py
└── utils.py
Embedding Model : WaveNet
A file with the main function has an argument setting and a training function.
Make dataset for PUMAD
-
Parameters
- X_data : [ndarray] *shape (data, # of features, window length)
- y_data : [ndarray] *shape : (data)
- n_htables : # of the hash tables in DHF
- n_bits : # of bits in hash
- transform : embedding model
-
Outputs
- (anchor, pos, neg) : tuple of anchor tensor, positive tensor, negative tensor
Pytorch Dataset
class for test dataset
Function for creating data with diverse experiment settings
-
Parameters
- X_train: input train data [ndarray]
- y_train: input train labels [ndarray]
- X_test: input test data [ndarray]
- y_test: input test labels [ndarray]
- n_labels: # of partially observed anomalies [int]
-
Outputs
- preprocessed_data : [list] *[X_train, y_train, y_train_pu, X_test, y_test]
It is a function of finding reliable normal and potentail abnormal through Distance Hashing based Filtering(DHF) with embedded data.
-
Parameters
- embedded_anom : embedding anomaly data
- embedded_noise : embedding unlabeled data
- n_htables : # of the hash tables in DHF
- n_bits : # of bits in hash
-
Outputs
- reliable normal index
- potential abnormal index
- avg_n_buckets : average for the # of hash table buckets