NYU CS-GY 6923
Machine Learning

A broad introduction to the exciting field of machine learning through a mixture of hands-on experience and theoretical foundations.

Course Team:

Professor
Christopher Musco

Course Assistant
Ozlem Yildiz

Course Assistant
Siddharth Sagar

Course Assistant
Thomas Liu

Lectures: 215 Rogers Hall. Virtually via Zoom (links on Brightspace).
Professor office hours: Weekly on Mondays 11am-1pm. Zoom link.
Thomas office hours: Weekly on Wednesdays 12-1pm. Zoom link.
Siddharth office hours: Weekly on Tuesdays 3-4pm. Zoom link.
Ozlem office hours: Weekly on Tuesdays 12-2pm. Zoom link.

Syllabus: here.
Grading breakdown: Written Problem Sets 25%, Programming Labs (including mini-project) 25%, Midterm 20%, Final Exam 20%, Participation 10%

Ed Stem: All course communicate will be via Ed, so please create an account and join our site. All questions should also be posted to Ed (not sent via emails). We prefer that questions about lectures or homework are asked publicly, since they will often help your classmates, but Ed supports private questions for things relevant only to you.

Python and Jupyter: Demos and labs in this class use Python, run through Jupyter notebooks. Jupyter lets you create and edit documents with live Python code and rich comments and images. We suggest that students run their Jupyter notebooks via Google Colaboratory, and we will share them via Colab. Uou also have the option of installing and running everything on your personal computer. Instructions can be found here.

Prerequisites: Modern machine learning uses a lot of math! Probably more than any other subject in computer science outside theoretical computer science. You can get pretty far with an understanding of just calculus, probability, and linear algebra, but that understanding needs to be solid for you to succeed in this course. Formally we require a prior course in probability or statistics. If you need to freshen up on linear algebra, this quick reference from Stanford is helpful.

Homework: Homework (both written problems and coding labs) must be turned in to Gradescope by the specified deadline. Use the code P5D5BP to join the class on Gradescope. We do not accept late work without prior permission.

Labs should be turned in as evaluated Jupyter notebooks. Do not clear the output before turning in. While not required, for written problem sets I encourage students to prepare problem sets in LaTeX or Markdown (with math support.) You can use this template for LaTeX. While there is a learning curve, these tools typically save students time in the end! If you do write problems by hand, scan and upload as a PDF.

Discussion is allowed on homework, but solutions and code must be written independently. See the syllabus for details. We have a zero tolerance policy for copied code or solutions: any students with duplicate or very similar material will receive a zero on the offending assignment. My advice is to never share code or solutions with other students.

Resources: There is no textbook to purchase. I may post readings, some of which will come from the following book, which is available free online via the NYU library:

An Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani.

I have also found the lectures and notes from the companion site to the book Learning With Data (ISBN 978-1-60049-006-4) can be very helpful. This is a compact, inexpensive book if you want to purchase.

Final Project: See guidelines for the final project here.

Lecture #	Topic	Reading	Homework
Regression and Function Fitting
1. 1/27	Introduction to Machine Learning, Simple Linear Regression, Loss Functions	Lecture 1 slides (annotated). Probability Review: See this complete references or the very good resources on Khan Academy. Linear Algebra Review: This should get you started.	Work through Demo 1 on numpy and working with arrays and plots (not turned in). Work through simple regression example in Demo 2. Complete Lab 1. Due 11:59pm, Thursday 2/3.
2. 2/3	Multiple Linear Regression, Data Transformations, Model Selection, Regularization	Lecture 2 slides (annotated). Notes on computing gradients. (raw markdown). For additional reading, see Chapter 3.2 in An Introduction to Statistical Learning.	Complete written Homework 1. Due 11:59pm, Thursday 2/10. 10% bonus if you typeset solutions in Markdown or Latex! Work through additional numpy matrix demo: `demo_numpy_matrices.ipynb`. Work through multiple linear regression demo in `demo_diabetes.ipynb`. For Homework 1, it might be helpful to check your answer for Problem 4 using an approach similar to the one I implement here in `gradient_demo.ipynb`
3. 2/10	Finish model selection, Regularization, Start Bayesian Perspective	Lecture 3 slides (annotated). Additional reading on feature selection: Chapter 6.1 in AISL. Additional reading on regularization: Chapter 6.2 in AISL.	Work through polynomial model selection demo: `demo_polyfit.ipynb`. Complete Lab 2.1. Due 11:59pm, Friday 2/18. Complete Lab 2.2, Lab 2.2. Due 11:59pm, Friday 2/18.
4. 2/17	Naive Bayes, the Bayesian Perspective	Lecture 4 slides (annotated). Additional lecture notes on the Naive Bayes Algorithm. Section 3 of these notes gives a nice overview of least squares regression from a statistical/probabilistic modeling perspective. Also see section on logistic regression.	Work through logistic regression demo: `demo_breast_cancer.ipynb`. Complete written Homework 2. Due 11:59pm, Tuesday 3/1. Problem 3 requires completing the code stub at `hw2_stub.ipynb`
Classification
5. 2/24	Linear Logistic Regression, Optimization, Gradient Descent	Lecture 5 slides (annotated). I filled in the proofs on pages 46 and 47 for convexity of the least squares loss. Notes on computing the gradient for logistic regression (optional to review).
6. 3/3	Optimization, Gradient Descent, Stochastic Gradient Descent	Lecture 6 slides (annotated). For information on the exam details, structure, and topics covered, consult Midterm 1 information. Here are some sample questions which will be similar to those on the exam.
7. 3/10	Midterm Exam (first half of class) Learning Theory, the PAC model	Lecture 7 slides (annotated). Notes from Nika Haghtalab on what we covered today. If you are interested in learning more (e.g. about infinite hypthesis classes) see her notes for late lectures.
3/17	Spring break, no class.
Beyond Linear Methods
8. 3/24	k-Nearest Neighbors, Kernel Methods	Lecture 8 slides (annotated). For additional reading and visualizations for k-NN classifiers, see Chapter 2.2 in An Introduction to Statistical Learning.	Complete Lab 3, `lab3.ipynb`. Due 11:59pm, Wed. 3/30. The first half of this lab is a demo, which you should go through slowly. The parts you actually have to fill in don't start until the "L2 Regularization" section.
9. 3/31	Support Vector Machines, Neural Networks 1: Introduction, History	Lecture 9 slides (annotated). For additional reading on SVMs, see Chapter 9 in An Introduction to Statistical Learning.	Complete written Homework 3. Due 11:59pm, Thursday 4/14. Spend 30 mins or so messing around with playground.tensorflow.org to build some intuition for working with neural nets! Work through SVM demo: `demo_mnist_svm.ipynb` Complete Lab 4. Due 11:59pm, Thursday 4/14.
10. 4/7	Neural Networks 2: Backpropagation, Convolution	Lecture 10 slides (annotated).	Work through Keras neural network demo on synthetic data: `keras_demo_synthetic.ipynb` Work through Keras neural network demo on MNIST data: `keras_demo_mnist.ipynb`
11. 4/14	Convolution, Feature Extraction, Transfer Learning	Lecture 11 slides (annotated). For additional reading on generalization in neural nets, see Chapter 10.8 in An Introduction to Statistical Learning.	Work through demo on convolution and creating convolutional layers in Keras: `demo_convolutions.ipynb`. Work through demo training a convolutional network for the CIFAR-10 dataset: `demo_classifier.ipynb`. To make sure Colab is using a GPU, click on the Runtime tab and then Change Runtime Environment. Select GPU under hardware acceleration.
Unsupervised Learning
12. 4/21	Auto-encoders, Principal Component Analysis	Lecture 12 slides (annotated). Check out this research project to get a sense of some of the amazing things people are doing in generative ML.	Complete written Homework 4. Due 11:59pm, 5/9. You will need this data file: UScities.txt OR Complete a final project following the guidelines
13. 4/28	Semantic Embeddings, Beyond Auto-encoders	Lecture 13 slides (annotated).
Selected Topics
14. 5/5	Introduction to Reinforcement Learning	Lecture 14 slides (annotated). Great book for additional reading on Reinforcement Learning.	For information on the exam details, structure, and topics covered, consult the final exam information.

NYU CS-GY 6923Machine Learning

Course Team: Professor Christopher Musco Course Assistant Ozlem Yildiz Course Assistant Siddharth Sagar Course Assistant Thomas Liu

NYU CS-GY 6923
Machine Learning

Course Team:

Professor
Christopher Musco

Course Assistant
Ozlem Yildiz

Course Assistant
Siddharth Sagar

Course Assistant
Thomas Liu