NYU CS-GY 6763 (3943)
Algorithmic Machine Learning
and Data Science

Advanced theory course exploring contemporary computational methods that enable machine learning and data science at scale.

Course Team:

Christopher Musco
Christopher Musco
Aarshvi Gajjar
Course Assistant
Aarshvi Gajjar
Indu Ramesh
Course Assistant
Indu Ramesh
Teal Witter
Course Assistant
Teal Witter

Lectures: Rogers Hall, Room 707. Recordings available through Brightspace.
Reading group: Check the schedule.
Professor office hours: Weekly on Mondays 9am-11am. Zoom link.
TA office hours (general): 1:30-3pm on Wednesdays. 8th floor common area, 370 Jay St or on Zoom.
TA office hours (undergrad only): 1-3pm on Thursdays. 8th floor common area, 370 Jay St.

Syllabus: here.
Grading breakdown: Quizzes 10%, Problem Sets 45%, Midterm 15%, Final project OR final exam 20%, Partipation 10%
Final project guidelines: here.

Quizzes: Weekly check-in quizzes will be administered via Google Forms. Link will be posted on this site. They must be completed by 2:00pm ET the Tuesday. after they are posted.

Problem Sets: Problem sets must be turned in via Gradescope on NYU Brightspace. While not required, I encourage students to prepare problem sets in LaTeX or Markdown (with math support.) You can use this template for LaTeX. While there is a learning curve, these tools typically save students time in the end! If you do write problems by hand, scan and upload as a PDF. Collaboration is allowed on homework, but solutions and code must be written independently. Writing should not be done in parallel, and students must list collaborators for each problem separately. See the syllabus for details.

Prerequisites: This course is mathematically rigorous, and is intended for graduate students and advanced undergraduates. Formally we require previous courses in machine learning, algorithms, and linear algebra. Experience with probability and random variables is necessary. See the syllabus for more details and email Prof. Musco if you have questions about your preparation for the course!

Resources: There is no textbook to purchase. Course material will consist of my written lecture notes, as well as assorted online resources, including papers, notes from other courses, and publicly available surveys. Please refer to the course webpage before and after lectures to keep up-to-date as new resources are posted.

Reading Group: It's an exciting time for research at the intersection of algorithm design and the data sciences. Most of the topics covered in this course are still the subject of active research. Starting midway through the semester we will be holding a reading group for students working on final projects (and any others who wish) to discuss and workshop papers.
If you will be participating in the reading group, please sign up to be a presenter or discussion leader for at least one week in this spreadsheet, which also contains the schedule.

Problem Sets:
Problem Set 1 (due Monday, Sept. 27th by 11:59pm ET).
Problem Set 2 (due Monday, Oct. 18th by 11:59pm ET).
Midterm Information (exam on Tuesday, Oct. 26th).
Problem Set 3, UScities.txt (due Wednesday, Nov. 24th by 11:59pm ET).
Problem Set 4 (due Wed, Dec. 15th by 11:59pm ET).
Final Exam Information (exam on Tuesday, Dec. 21st).

Week # Topic Reading Homework
The Power of Randomness
1. 9/7 Random variables and concentration, Markov's inequality, applications
  • Lecture 1 notes (annotated).

  • Probability review! None of this should be new, but you might want to brush up if you haven't taken prob/stat in a while. Indu recommends this resource for review.
  • Typed lecture notes covering another application of Markov's inequality to analyzing hashing, and proving universality of random linear hash function.
  • Interesting paper on applications of mark-and-recapture to network size estimation, and some cool improve methods.
2. 9/14 Chebyshev inequality, exponential tail bounds (Chernoff + Bernstein), and applications
  • Lecture 2 notes (annotated).

  • Original paper giving a loglog(n) space algorithm for the distinct elements problem. Follow-up work on state-of-the-art Hyperloglog algorithm.
  • Additional reading on concentration bounds can be found in Terry Tao's notes.
  • For a proof of the "power of two choices'' result, see Section 2 in this survey
  • My favorite proof of the Union Bound can be found here.
3. 9/21 High-dimensional geometry and the Johnson-Lindenstrauss lemma
4. 9/28 Locality sensitive hash functions, applications to near neighbor search
  • Lecture 4 notes (annotated).
  • Good overview of similarity estimation and locality sensitive hashing in Chapter 3 here.
  • These lecture notes are also helpful. These resources use slightly different language (and a slightly different version of MinHash) than I used in class.
  • If you want to learn more about worst-case runtime guarantees (like the Indyk-Motwani result mentioned in class) take a look at my typed lecture notes.
  • Problem Set 2, due Mon. 10/18.
  • Week 4 Check-in quiz, due Tues. 10/05 before class.
5. 10/5 The role of convexity, gradient descent and projected gradient descent
  • Lecture 5 notes (annotated).

  • If you need to freshen up on linear algebra, now is good time! This quick reference from Stanford mostly covers what we need.
  • Good book on optimization which is freely available online through NYU libraries.
  • Excellent lecture notes from Aleksander Mądry for more reading on analyzing gradient descent.
  • Moritz Hardt's lecture notes with proofs of gradient descent convergence in all the regimes discussed in class.
  • Sébastien Bubeck's convex optimization book mentioned in class. Fairly technical, but great reference.
  • Proof of accelerated gradient descent for general convex functions.
  • Decide on if you will be completing a project by Friday, 10/8. Sign up for reading group slot.
  • Week 5 Check-in quiz, due Tues. 10/19 before class.
6. 10/19 Online and stochastic gradient descent, coordinate descent, preconditioning
  • Lecture 6 notes (annotated).

  • Elad Hazan's book on online convex optimization is a great reference if you are interested in this topic.
  • Useful document for linear algebra review. Section 3 is especially important.
7. 10/26 Midterm Exam (first half of class)

Discrete optimization: submodularity and greedy methods.
8. 11/9 Constrained optimiziation, center of gravity method, linear programming, LP relaxation.
Spectral Methods and Linear Algebra
9. 11/9 Singular value decomposition, Krylov methods
10. 11/23 Spectral graph theory, spectral clustering, generative models for networks
11. 11/30 Randomized numerical linear algebra, sketching for linear regression, ε-nets
  • Lecture 11 notes. (annotated).
  • My written notes on sketched regression and ε-nets.
  • Jelani Nelson's course notes from Harvard with a lot more on randomized linear algebra, including methods for sparse JL sketching and randomized low-rank approximation.
  • ε-net arguments are used all over the place in learning theory, algorithm design, and high dimensional probability. Here's an example of how they appear in a different context.
Fourier Methods
12. 12/7 Sparse recovery and compressed sensing, restricted isometry property
13. 12/14 Finish up compressed sensing, introduction to importance sampling and leverage scores.
15. 12/21 Final Exam (during regular class time)