NYU CS-GY 9223I
Algorithmic Machine Learning
and Data Science

Advanced topics course exploring contemporary algorithmic techniques and recent research on computational methods that enable machine learning and data science at scale.

Course Instructor: Professor Christopher Musco


Course Summary

Prerequisites: This course is mathematically rigorous, and is intended for graduate students or advanced undergraduates. I require a previous course in machine learning (for example, CS-UY 4563, CS-GY 6923, or ECE-GY 6143) and a previous course in algorithm design and analysis (for example, CS-UY 2413, CS-GY 6033, or CS-GY 6043). Experience with linear algebra and probability is also necessary. Email the course instructor if you have questions about your preparation for the course! Undergraduates wishing to enroll will be registering for CS-UY 3943.

Coursework: One meeting per week. 4 problem sets involving analysis and application of algorithmic methods learned in class, potentially with programming exercises to further explore lecture content (40% of grade). Midterm exam (20% of grade). Final exam (30% of grade). Class participation is the remaining 10% of the grade. Please consult the formal syllabus for more information.

Resources: There is no textbook to purchase. Course material will consist of my written lecture notes, as well as assorted online resources, including papers, notes from other courses, and publicly available surveys. Please refer to the course webpage before and after lectures to keep up-to-date as new resources are posted.

Homework:
Homework 1 (due Thursday, Sept. 26th by 11:59pm).
Homework 2 (due Thursday, Oct. 17th by 11:59pm).
Homework 3, UScities.txt (due Monday, Dec. 2nd by 11:59pm).
Homework 4 (due Monday, Dec. 16th by 11:59pm).


Administrative Information

Lectures: Wednesdays 3:20-5:50pm in Rogers Hall, RGSH 707.

Syllabus: here.

Office hours: Thursdays, 3-5pm, or by appointment. 370 Jay St., Office 1105 (11th floor).
Reading Group: 10am Tuesdays, 370 Jay St. Room 1114.

Homework: While not required, I encourage students to prepare problem sets in LaTeX or Markdown (with math support.) You can use this template for LaTeX. While there is a learning curve for LaTeX (less for Markdown), it typically saves students lots of time by the end of the semester!

For regular homework problems collaboration is allowed, but solutions and any code must be written independently. Students must list collaborators on their problem sets (for each problem separately). See the syllabus for full details.

Optional Reading Group: It's an exciting time for research at the intersection of algorithm design and the data sciences. Many of the ideas covered in this course are still the subject of active research. We currently hold a reading to discuss recent papers on Tuesdays at 10am in Room 1114, 370 Jay St.. If you are interested in participating in the reading group, please email me.

Lecture # Topic Required Reading Optional Reading
The Power of Randomness
1. 9/4 Concentration of random variables + applications to hashing and load balancing Lecture 1 notes.
  • In practice, efficient "pseudorandom" hash functions need to be used in place of "fully random" functions. See these notes for details on one class of simple hash functions called "2-universal" hash functions, which can be used for all applications discussed in class.
2. 9/11 Chernoff bounds + sketching and streaming algorithms Lecture 2 notes.
(annotated)
  • For a proof of the "power of two choices'' result, see Section 2 in this survey.
  • Additional reading on concentration bounds can be found in Terry Tao's notes.
  • See this paper for the latest on streaming distinct elements estimation in practice.
3. 9/18 Sketching, the Johnson-Lindenstrauss lemma + applications Lecture 3 notes.
(annotated)
  • Book chapter on Jaccard similarity, MinHash, etc.
  • Helpful notes on the JL Lemma by Anupam Gupta.
  • Cool paper on an application of MinHash to detecting seismic events.
  • Original paper giving a loglog(n) space algorithm for the distinct elements problem.
4. 9/25 Nearest neighbor search + locality sensitive hashing Lecture 4 notes.
(annotated)
  • Good overview of similarity estimation and locality sensitive hashing in Chapter 3 here. There are also lecture notes. These resources use slightly different language (and a slightly different version of MinHash) than I used in class.
First-Order Optimization
5. 10/2 Convexity in machine learning + vanilla, stochastic, and online gradient descent Lecture 5 notes.
(annotated)
  • If you need freshen up on linear algebra, now is good time! This quick reference from Stanford mostly covers what we need.
  • Good book on optimization which is freely available online through NYU libraries.
  • Excellent lecture notes from Aleksander Mądry for more reading on analyzing gradient descent.
6. 10/9 Finishing up SGD, smoothness, strong convexity, conditioning. Lecture 6 notes.
(annotated)
  • Moritz Hardt's lecture notes with proofs of gradient descent convergence in all the regimes discussed in class.
7. 10/16 Preconditioning, acceleration, randomized coordinate descent, adaptive methods. Lecture 7 notes. (annotated)
  • See Section 5 (rest is not relevant) of Aleksander Mądry's lecture here for an overview of preconditioning, which will be covered in class.
  • Moritz Hardt's lecture notes are a good reference for coordinate decent and acceleration, which will be covered if we have time.
  • Proof of accelerated gradient descent for general convex functions.
8. 10/23 Midterm exam (first half of class)

Finishing up coordinate descent, optimization beyond convexity (second half of class)
Lecture 8 notes. (annotated)
  • Diverse, theory-leaning blog which discusses lots of problems and approaches in understanding the optimization of non-convex problems.
  • Another useful document for linear algebra review. Section 3 is especially important.
Spectral Methods and Linear Algebra
9. 10/30 Singular value decomposition, Power Method, Krylov methods Lecture 9 notes. (annotated)
10. 11/6 Finish up Krylov methods, spectral clustering, and spectral graph theory. Lecture 10 notes. (annotated)
11. 11/13 Finish spectral graph theory, start randomized linear algebra, sketching for linear regression, ε-nets arguments Lecture 11 notes. (annotated)
12. 11/20 Randomized numerical linear algebra linear, fast Johnson-Lindenstrauss transform, sampling. Lecture 12 notes. (annotated)
  • Jelani Nelson's course notes from Harvard with a lot more on randomized linear algebra, including methods for sparse JL sketching and randomized low-rank approximation.
  • ε-net arguments are used all over the place in learning theory, algorithm design, and high dimensional probability. Here's an example of how they appear in a different context.
11/27 No Class, Thanksgiving
Fourier Methods
13. 12/4 Compressed sensing, the restricted isometry property, basis pursuit, the discrete Fourier transform Lecture 13 notes. (annotated)
  • Last year's lecture notes on compressed sensing and basis pursuit.
  • Lots of great resources under weeks 8 and 9 of Stanford's Algorithmic Toolbox course.
  • Webpage on Sparse Fourier Transform algorithms and some applications.
14. 12/11 High dimensional geometry, maybe some more Fourier methods Lecture 14 notes. (annotated)
15. 12/18 Final exam