NYU CS-GY 6923 Machine Learning

An introduction to the exciting field of machine learning through a mixture of hands-on experience and theoretical foundations.


Course Team:

Christopher Musco
Professor
Christopher Musco
Prajjwal Bhattarai
Course Assistant
Prajjwal Bhattarai
Marc Chiu
Course Assistant
Marc Chiu
Usaid Malik
Course Assistant
Usaid Malik
Navya Kriti
Grader
Navya Kriti
Adith Santosh
Grader
Adith Santosh

Lectures: Friday 2:00pm-4:30pm, 2 Metrotech, Room 911.
Zoom Recordings available through Brightspace.
Professor office hours: Wednesdays 10am-11:30am, on Zoom.
TA office hours:
Marc: Tuesdays 1pm-3pm, on Zoom.
Prajjwal: Fridays 11am-1pm, in person, 8th floor common area 370 Jay St.
Usaid: Wednesday 1pm-3PM, on Google Meets.

Grading breakdown: Written Problem Sets 20%, Programming Labs 20%, Midterm 25%, Final Exam 25%, Participation 10%

Ed Discussion: All course communication will be via Ed, so please join our site from Brightspace. Please use Ed instead of email for any questions. We prefer that lectures or homework questions are asked publicly, since they will often help your classmates. Ed also supports private questions for things relevant only to you.

Python and Jupyter: Demos and labs in this class use Python, run through Jupyter notebooks. Jupyter lets you create and edit documents with live Python code and rich comments and images. We suggest that students run their Jupyter notebooks via Google Colaboratory, and we will share them via Colab.

Prerequisites: Modern machine learning uses a lot of math! Probably more than any other subject outside of theoretical computer science. You can get pretty far with an understanding of just calculus, probability, and linear algebra, but that understanding needs to be solid for you to succeed in this course. Formally we require a prior course in probability or statistics. If you need to freshen up on linear algebra, this quick reference from Stanford is helpful.

Homework: Homework (both written problems and coding labs) must be turned in to Gradescope by the specified deadline. You can access our site via Brighspace. You are allowed 3 "slip days", i.e., one day extensions for any 3 assignments over the course of the semester.

Labs should be turned in as evaluated Jupyter notebooks. Do not clear the output before turning in. While not required, for written problem sets I encourage students to prepare problem sets in LaTeX or Markdown (with math support.) You can use this template for LaTeX. While there is a learning curve, these tools typically save students time in the end! If you do write problems by hand, scan and upload as a PDF.

Discussion is allowed on homework, but solutions and code must be written independently. See the syllabus for details. We have a zero tolerance policy for copied code or solutions: any students with duplicate or very similar material will receive a zero on the offending assignment. My advice is to never share code or solutions with other students.

Resources: There is no textbook to purchase. I may post readings, some of which will come from the following book, which is available free online via the NYU library:

I have also found the lectures and notes from the companion site to the book Learning With Data (ISBN 978-1-60049-006-4) can be very helpful. This is a compact, inexpensive book if you want to purchase.

Lecture # Topic Reading Homework
Function Fitting and Regression
1. 9/6 Introduction to Machine Learning, Loss Functions, Simple Linear Regression, Multiple Linear Regression
  • If you have not used numpy or matplotlib before, or wish to brush-up, work through Demo 1. Not turned in.
  • Work through simple regression example in Demo 2. Not turned in.
  • Complete Lab 1, Due 11:59pm, Monday 9/16
2. 9/13 Finish Multiple Linear Regression, Data Transformations, Model Selection
  • Work through multiple regression example in Demo 3. Not turned in.
  • Work through Demo 4 on polynomial regression and model order selection. Not turned in.
  • Complete Lab 2, Due 11:59pm, Tuesday 9/24
  • Complete written Homework 1. Due 11:59pm, Tuesday 10/1. 10% bonus if you typeset solutions in Markdown or Latex!
Bayesian Methods and Probabilistic Models
3. 9/20 Regularization, Naive Bayes, the Bayesian Perspective
  • Lecture 3 slides (annotated).
  • Additional reading on feature selection: Chapter 6.1 in AISL.
  • Additional reading on regularization: Chapter 6.2 in AISL.
  • Additional lecture notes on the Naive Bayes Algorithm.
  • Section 3 of these notes gives a nice overview of least squares regression from a statistical/probabilistic modeling perspective.
  • Math to review before lecture: Discrete random variables, probability distribution, joint probability, conditional probability, Bayes rule.
4. 9/27 More Bayesian Machine Learning, Modeling Language
  • Lecture 4 slides (annotated).
  • Additional reading on logistic regression: Chapter 4.1-4.3 in AISL.
  • These notes also give a good overview of logistic regression from a Bayesian perspective.
  • Math to review: Continuous probability density functions, Gaussian random variables (know the expression for the Gaussian probability density function), Laplace random variables.
  • Complete Lab 3. Due 11:59pm, Tuesday 10/8.
Classification
5. 10/4 K-nearest neighbors, Logistic Regression, Optimization
  • Math to review: Softmax function, logistic loss.
  • Work through logistic regression demo: demo_breast_cancer.ipynb.
  • Complete written Homework 2. Due 11:59pm, Tuesday 10/15. No slip days allowed for this homework so that we can release solutions before the exam.
6. 10/11 Gradient Descent
  • Math to review: Directional derivative, convexity of functions, AM-GM inequality, Cauchy-Schwarz inequality.
  • Here are some sample questions which will be similar to those on the exam. Here are the solutions.
7. 10/18 Midterm Exam
Midterm will take first half of lecture. Second half will be a short lecture on Differential Privacy in machine learning.
8. 10/25 Stochastic gradient descent, introduction to learning theory and the PAC model
  • Complete Lab 4. Due 11:59pm, Tue. 11/5.
  • The first half of this lab is a demo, which you should go through slowly. The parts you actually have to fill in don't start until the "L2 Regularization" section.
Beyond Linear Methods
9. 11/1 Kernel Methods, Support Vector Machines
10. 11/8 Finish SVMs, Introduction to Neural Nets
11. 11/15 Backpropagation, Convolution, Feature Extraction
  • Lecture 11 slides (annotated).
  • For additional reading on generalization in neural nets, see Chapter 10.8 in An Introduction to Statistical Learning.
  • Demo on convolution and creating convolutional layers in Keras: demo_convolutions.ipynb.
  • Demo on training a convolutional network for the CIFAR-10 dataset: demo_cnn_classifier.ipynb. To make sure Colab is using a GPU, click on the Runtime tab and then Change Runtime Environment. Select GPU under hardware acceleration.
Unsupervised Learning
12. 11/22 Finish Convolutional Nets, Adversarial Examples, Auto-encoders
11/29 Thanksgiving break, no class.
13. 12/6 Principal Component Analysis, Semantic embeddings
14. 12/11 Finish semantic embeddings, image generation, introduction to Reinforcement Learning
12/20 Final Exam