A short project analyzing the different demographic attributes of high schools in NYC, and how they relate to standardized testing scores
Overall the process used to process and analyze the data is as follows:
- Data cleaning and processing
- Read in the dataset
- Convert numeric data to numeric data type
- Standardize the datasets: Since we're merging a bunch of datasets together (some with overlapping information encoded differently), we need to standardize some of the data across datasets
- Combine and condense datasets: Pare down the data, and merge all datasets together
- Analysis
- Generate correlation table: Find out which features of schools in NYC most closely correlate to SAT scores
- Visualize correlation: Generate scatter plots and bar charts to visualize the correlation between several features and SAT scores
- Exploratory data analysis: Investigate several questions with visualizations for curiosity's sake