NYC Vehicle Collision Dataset Analysis with National Oceanic and Atmospheric Administration(NOAA) weather dataset
The data for the project analysis were downloaded from the following sites:
- Vehicle collision dataset has been downloaded from NYC Open Data using Socrata Open Data API - API Request
- Weather dataset was downloaded from NOAA Weather Data where a selection has to be made for the date range and type of data needed for the analysis. The processing of the data takes 2-3 days depending on the type and size of data you are requesting - CSV File
- For the vehicle collision data, the data is downloaded using the API token key obtained after registering in their website. The data is in the form of JSON file.
- The weather data is downloaded after recieving a email from NOAA Weather Data with the download link.
- The notations in the weather datset are as follows: PRCP = Precipitation (inches as per user preference, inches on Daily Form)
- SNOW = Snowfall (inches as per user preference, inches on Daily Form)
- SNWD = Snow depth (inches as per user preference, inches on Daily Form)
- The downloaded vehicle collision data JSON files are stored according to the different boroughs of NewYork(Manhattan, Queens, etc) and their respective zip codes. You can view the respective folder structure here.
- The weather dataset has been stored as a CSV file.
- The vehicle colision JSON file have been converted into CSV file by checking for each key value pair and are stored here.
- The data from both the datasets are merged using Panda dataframes by selecting the common 'DATE' column available in both.
The following analysis were made using the merged dataset.
Finding out the top 10 factors that caused collision in NYC by using factors like Contibuting Factor and Number of Persons killed.
- Reading the raw data from the CSV file.
- Taking the total count of the persons killed by grouping the factors responsible for collision.
- Removing unwanted column where factors are not specified.
- Inserting the data into the dataframe with the factors and the corresponding number of people that were killed by the collision.
- Saving the data from the dataframe to a CSV file.
- Plotting a graph to display the factors for the same.
- Saving the plotted graph as a PNG file.
Vehicle type involved in collision and number of people killed at each borough of NYC from 2015-2017
Finding out the top vehicle types that caused collision in each borough of NYC by using factors like Borough, Vehicle Type and Number of Persons killed.
- Reading the raw data from the CSV file.
- Taking the total count of the persons killed by grouping the vehicle type and borough involved in collision.
- Selecting the date range for the collisions.
- Inserting the data into the dataframe with the borough, vehicle type and the corresponding number of people that were killed by the collision.
- Saving the data from the dataframe to a CSV file.
- Plotting a graph to display the vehicle type and number of people killed in each borough for the same.
- Saving the plotted graph as a PNG file.
Finding out the highest snowfall and rainfall in inches which caused maximum number of injuries from collision to the people of NYC
- Reading the raw data from the CSV file.
- Selecting the date range from the raw data to do analysis.
- Converting the date column to datetime format.
- Merging the data together and calculating the mean date from collision data as it includes multiple entries for a single day.
- Grouping by snowfall and the total persons injured calculated by adding all the injuries.
- Saving the data from the dataframe to a CSV file.
- Plotting a graph to display the snowfall in inches and the number of injuries for the same.
- Saving the plotted graph as a PNG file.
- Following the same steps as above for rainfall analysis.