For the project you can do anything you want, provided it uses what we learned in the course. Here's a summary of what's expected:
-
Python functions in modules with tests
-
Streamlit for user interface
-
Pandas for data manipulation
-
Visualizations (seaborn, geopandas, folium, plotly express)
-
APIs (cent.ischool-iot.net portal or any other API's of your choosing) and/or
Web Scraping using playwright if you cannot find an API.
Your grade is proportional to the number of expectations you complete with intentionality.
Its up to you! Do you need some ideas? Consider building a data pipeline like we did in assignment 04/05/06 and then building a dashboard around the data using visualizations and streamlit.
Break your pipeline up into separate python programs to perform each step similar to assignment 05 and 06.
-
Extract data from API's / web scraping / or a dataset. Save the data to a file in your cache folder.
-
Transform the data into a format that is useful for your dashboard. Save the data to a file in your cache folder.
-
Load the data into a pandas and interact with it using streamlit and charts, graphs or maps.
-
part of your extract or transform steps you might need to write functions. Make sure you write tests for these functions, similar to what has been demonstrated in the class assignments.
Here's a place where you can find API's that might interest you:
- Awesome Public API's: https://github.com/public-apis/public-apis
- US Government API's: https://api.data.gov/
- Portal of Public APIs: https://publicapis.dev/
- Always welcome to use: https://cent.ischool-iot.net/
Here's a suggestion for data sources / datasets of interest:
- SU Library: https://researchguides.library.syr.edu/az/databases?t=8828
- Data Catalog: https://data.world/datasets/free
- Most cities have an open data portal, for example: https://opendata.cityofnewyork.us/
- British Film Institute: https://www.bfi.org.uk/industry-data-insights
- NASA Data, for example: https://www.earthdata.nasa.gov/
- Sports Data: https://www.sports-reference.com/
- US Government data: https://data.gov/
- Google Dataset Search: https://datasetsearch.research.google.com/
- Kaggle Datasets: https://www.kaggle.com/datasets
- Awesome Public Datasets: https://github.com/smuthubabu/awesome-public-datasets
Push your github repository to the github classroom assignment. When you are done. You can push multiple times, but the last push before the deadline is what will be graded.
- Complete your project in the
code
folder and have your tests in thetests
folder. Any files you save should be in thecache
folder. - There is no graderbot here, so complete the
about-project.md
to explain what your project does and how I should run it. It's okay if I need to run a couple of Python programs. - Complete your
reflection.md
to reflect on what you learned, what you struggled with, and what you might do if you had more time with the project.
It is important to me that I see "YOU" in your project. That means:
- Share what you learned,
- Do something that interests you,
- Explain what you did and how it works!