US-Transporation is the name of our dataset that contains sensor data from over 13 users. In light of the lack in the literature of a common benchmark for TMD, we have collected a large set of measurements belonging to different subjects and through a simple Android Application. We openly release the dataset, so that other researchers can benefit from it for further improvements and research reproducibility.
Our dataset is built from people of different gender, age and occupation. Moreover, we do not impose any restriction on the use of the application, hence every user records the data performing the action as she/he is used to, in order to assess real world conditions.
In this page in addition to downloadable datasets, you can find Python's code for extracting features,and building machine learning models to make predictions.
You can find more information about the dataset and our work at: http://cs.unibo.it/projects/us-tm2017/index.html.
Please cite the paper below in your publications if it helps your research:
@article{carpineti18,
Author = {Claudia Carpineti, Vincenzo Lomonaco, Luca Bedogni, Marco Di Felice, Luciano Bononi},
Journal = {Proc. of the 14th Workshop on Context and Activity Modeling and Recognition (IEEE COMOREA 2018)},
Title = {Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity},
Year = {2018}
DOI = {https://doi.org/10.1109/PERCOMW.2018.8480119}
}
On-line version available here: https://ieeexplore.ieee.org/abstract/document/8480119
In order to extecute the code in the repository you'll need to install the following dependencies:
In this section we show the functionalities developed in our work and the relative parameters used.
Function name | Parameter | Description |
---|---|---|
clean_file() |
Fix original raw files problems:
|
|
transform_raw_data() | Transform sensor raw data in orientation independent data (with magnitude metric) | |
__fill_data_structure | Fill tm, users, sensors data structures with the relative data from dataset | |
__range_position_in_header_with_features(sensor_name) | sensor_name: name of the sensor | Return position of input sensor in header with features |
create_header_files() | Fill directory with all file consistent with the header without features | |
__create_time_files() | Fill directory with all file consistent with the featured header divided in time window | |
__create_dataset() | Create dataset file | |
__split_dataset() | Split passed dataframe into test and train | |
preprocessing_files() | Clean files and transform in orientation independent | |
analyze_sensors_support() | For each sensors analyze user support, put support result in sensor_support.csv [sensor,nr_user,list_users,list_classes] | |
create_balanced_dataset(sintetic) | sintetic: set if data are sintentic or not. Default the value is False. | Analyze dataset composition in term of class and user contribution fill balance_time with minimum number of window for transportation mode |
get_excluded_sensors(sensor_set) | sensor_set: type of sensor dataset used with different sensor data. | Return list of excluded sensor based on the correspondent classification level |
get_remained_sensors(sensor_set) | sensor_set: type of sensor dataset used with different sensor data. | Return list of considered sensors based on the correspondent classification level |
get_sensors_set_features() | Return list of the sensors set with their features | |
get_sensor_features(sensor) | sensor: data of a specific sensor | Return the features of a specific sensor |
Function name | Parameter | Description |
---|---|---|
decision_tree(sensors_set) | sensor_set: type of sensor dataset used with different sensor data | Decision tree algorithm training on training al train set and test on all test set |
random_forest(sensors_set) | sensor_set: type of sensor dataset used with different sensor data | Random forest algorithm training on training al train set and test on all test set |
neural_network(sensors_set) | sensor_set: type of sensor dataset used with different sensor data | Neural network algorithm training on training al train set and test on all test set |
support_vector_machine(sensors_set) | sensor_set: type of sensor dataset used with different sensor data | Support vector machine algorithm training on training al train set and test on all test set |
classes_combination(sensors_set) | sensor_set: type of sensor dataset used with different sensor data | Use different algorithms changing target classes, try all combination of two target classes |
leave_one_subject_out(sensors_set) | sensor_set: type of sensor dataset used with different sensor data | |
support_vector_machine(sensors_set) | sensor_set: type of sensor dataset used with different sensor data | Use different algorithms leaving one subject out from training and testing only on this subject considering all classes in dataset and only user classes |
single_sensor_accuracy() | Use feature relative to one sensor to build model and evaluate |
Before starting, you must first download the data:
python download_dataset.py
Then you have to clean the raw data and extract the feature:
python TMDataset.py
Finally you can build models:
python TMDetection.py
For further and detail information about our code, see our tutorial section
Up to now the projects is structured as follows:
.
├── TransportationData
| ├── datasetBalanced
| └── ...
| └── _RawDataOriginal
| └── ...
├── README.md
├── LICENSE
├── const.py
├── function.py
├── TMDataset.py
├── TMDetection.py
├── util.py
├── sintetic_dataset_generator.py
├── sintetic_dataset_config.json
├── download_dataset.py
└── cleanLog.log
This work is licensed under a MIT License.
This project has been developed at the University of Bologna with the effort of different people:
- Marco Di Felice, Associate Professor - email: [email protected]
- Luciano Bononi, Associate Professor - email : [email protected]
- Luca Bedogni, Research Associate - email: [email protected]
- Vincenzo Lomonaco, PhD Student - email: [email protected]
- Matteo Cappella • Master student - email: [email protected]
- Simone Passaretti • Master student - email: [email protected]
- Claudia Carpineti • Master graduate student - email: [email protected]
I would need to know the units of the timestamps of each sensor measurements. In your article, you mention that the sampling frequency is approximately 20Hz. However, you do not specify the units of these time stamps. Are they given in seconds or milliseconds?
The timestamps are in milliseconds!
Can I assume that the units of the sensor data (accelerometer, gyroscope and magnetometer) are the standard ones (m/s^2, rad/s and uT, respectively)?
Yes.