Instructor: Bryan Wilder ([email protected])
This is a project-based course designed to provide students training and experience in solving real-world problems using machine learning, exploring the interface between research and practice. The goal of this course is to give students exposure to the nuance of applying machine learning to the real-world, where common assumptions (like iid and stationarity) break down. Students will learn how to formulate real-world business or policy scenarios as machine learning problems, how to address common challenges which arise in applying ML to such problems (e.g., distribution shift or missingness), and how to rigorously evaluate the results of such interventions in practice. We will place an emphasis throughout on issues related to ethics and fairness in machine learning, and discuss how choices throughout the machine learning pipeline – including problem formulation, outcome definition, data collection, and model training – contribute to the social impact of algorithmic systems.
The requirements of this course consist of two components: participating in in-class discussions and completing the course project.
This course emphasizes the process of thinking through real-world problems and how and when they can be addressed using machine learning. Accordingly, our class sessions will rely on student participation to discuss potential scenarios and case studies together. You are highly encouraged to engage actively during class discussions, which will make the course much more exciting for everyone. During such discussions, we expect you to be respectful at all times towards your fellow students.
Discussions will loosely follow the role-playing seminar format; for each paper, you will either be assigned a presenter role or complete the non-presenter assignment (described below).
Presenter assignment: You will periodically present a paper in class, taking on one of the following presenter roles.
- Educator: Explain the key ideas in the paper to the class. Keep your presentation to 5-10 minutes maximum.
- Investigator: Investigate one of the paper’s authors. What is their area of expertise? Where have they worked previously? What prior projects might have led to working on this one? Explore what motivated them to write the paper and what biases they may have.
- Reviewer: Discuss both one strength and one weakness of the paper. What did you like about the paper? Do you agree with the paper's stance/findings? Did the paper overlook anything? Is there something that could have strengthened the work?
- Discussion Leader: Prepare three discussion questions to ask the class, and lead the discussion amongst the students when answering these questions.
Non-presenter assignment: If you are not presenting the paper, you must still read the paper and provide at least one discussion question about the paper (e.g., something you're uncertain about or would like to hear discussed).
Students will complete a semester-long course project that explores the application of machine learning to a problem of practical interest. Students may work in groups of up to three people. Each group will select a dataset, which cannot be one commonly used in machine learning research. Over a series of assignments, each group will define a problem to be addressed using the dataset, clean and explore it, develop baselines and machine learning models, and explore the impact of additional desiderata on their pipeline (e.g., fairness, privacy, interpretability, or model efficiency). Students can either select their own dataset for the project (which cannot be a commonly used ML benchmark dataset) or select one from the set of examples provided.
Each component will contribute towards the final grade as follows:
Discussion participation (40%), graded out of 30 points
- Presenter assignments: 3 over the course of the semester, 5 points each. 15 total points
- Non-presenter assignments: 15 over the course of the semester (out of 16 possible sessions), 1 point each
- You are allowed to miss one non-presenter assignment without penalty.
- This format may be adjusted as needed during the semester.
Course project (60%), graded out of 100 points
- Assignment 0: 5 points
- Assignment 1: 20 points
- Assignment 2: 20 points
- Assignment 3: 20 points
- Assignment 4: 20 points
- Final presentation: 15 points