###Coursera : Getting and Cleaning Data Project
- Peer assessed project for the analysis of Samsung Galaxy II movement measurements
- Dataset was downloaded from (https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip) on 21st April 2014
- Further description of the data can be found at http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
###Aim of the script
- Using the
trainingandtestdatasets to read the measurements taken (X) - Subset the measurements taken to those only reporting Mean and Standard Deviation values (those including mean() and std())
- Assign columns referring to the IDs of Activity and Subject
- Merge
trainingandtestdatasets into a single resource - Rename the Activity IDs to more descriptive values found in
activity_labels.txt - Calculate the mean values for each Subject and Activity by melting the dataset and recasting
- Write the final tidy dataset to files
###Steps for usage of run_analysis.R script
- Begin by cloning this repository
- Download the dataset and extract the files maintaining the folder structure.
This will result in a
/UCI HAR Dataset/folder that has all the necessary data files. - Place the run_analysis.R script in this base folder.
- Set the working directory to the /UCI HAR Dataset/ folder.
- Run
source("run_analysis.R") - The tidy dataset will be created in the working directory as
tidy_data.txtandtidy_data.csvfiles
###Archive Files
- The Dataset.zip archive contains folders named
/testand/train - Each of these folders contain data in the form of:
X_(test/train).txt - Measurements of datay_(test/train).txt- Activity ID for each row of the datasubject_(test/train).txt- Corresponding Subject ID for each row of data
- As well as the raw data:
- Names of the measurements taken in
X_(test/train).txtare reported infeatures.txt - Descriptive Activity labels for the Activity IDs are reported in
activity_labels.txt
- Names of the measurements taken in