NYU CS-GY 6763 (3943)
Algorithmic Machine Learning
and Data Science

Advanced theory course exploring contemporary computational methods that enable machine learning and data science at scale.


Course Team:

Christopher Musco
Professor
Christopher Musco
Noah Amsel
Course Assistant, Recitation Leader
Noah Amsel
Pratyush Avi
Course Assistant
​​Pratyush Avi

Lectures: Friday 2:00pm-4:30pm, Jacobs (6 MetroTech), Room 775B. Live stream and recordings available through Brightspace.
Reading group: More info TBA.
Professor office hours: Wednesdays 9:00am-10:30am, Zoom link.
Noah problem solving session: TBA
Avi office hours: Wednedays 3-4:30pm, 8th Floor Common area, 370 Jay

Grading breakdown: Problem Sets 45%, Midterm 25%, Final Project OR Final Exam 20%, Partipation 10%

Problem Sets: Problem sets must be turned in via Gradescope. While not required, I encourage students to prepare problem sets in LaTeX or Markdown (with math support.) You can use this template for LaTeX. While there is a learning curve, these tools typically save students time in the end! If you do write problems by hand, scan and upload as a PDF. Collaboration is allowed on homework, but solutions and code must be written independently. Writing should not be done in parallel, and students must list collaborators for each problem separately.

Unless otherwise stated, referencing "non-standard" theorems and proofs not given in class or previous problems is not allowed. All solutions must be proven from scratch. If you are unsure if you can use a fact, ask on Ed.

Prerequisites: This course is mathematically rigorous, and is intended for graduate students and advanced undergraduates. Formally we require previous courses in machine learning, algorithms, and linear algebra. Experience with probability and random variables is necessary. See the syllabus for more details and email Prof. Musco if you have questions about your preparation for the course!

Resources: There is no textbook to purchase. Course material will consist of my slides, lecture notes scribed by Teal Witter, as well as assorted online resources, including papers, notes from other courses, and publicly available surveys. Please refer to the course webpage before and after lectures to keep up-to-date as new resources are posted.

Reading Group: It's an exciting time for research at the intersection of algorithm design and the data sciences. Most of the topics covered in this course are still the subject of active research. Starting a few weeks into the semester we will be holding a reading group for students working on final projects (and any others who wish) to discuss and workshop papers.

Problem Sets:
Problem Set 1 (due Thursday, Feb. 6th by 11:59pm ET).

Week # Topic Reading Homework
The Power of Randomness
1. 1/24 Random variables and concentration, Markov's inequality, applications
2. 1/31 Chebyshev inequality and applications
3. 2/7 Exponential tail bounds (Chernoff + Bernstein), efficient hash functions
4. 2/14 High-dimensional geometry, Johnson-Lindenstrauss lemma and dimensionality reduction
5. 2/21 Nearest neighbor search
Optimization
6. 2/28 Gradient descent and projected gradient descent
7. 3/7 Online and stochastic gradient descent
8. 3/14 Midterm Exam (first half of class)

Guest lecture second half of class (topic TBA).
9. 3/21 Center of gravity method, ellipsoid method, LP relaxation
3/28 NO CLASS. SPRING BREAK
10. 4/4 TBD
Spectral Methods and Linear Algebra
11. 4/11 Singular values decomposition, Krylov subspace methods
12. 4/18 Spectral graph theory, spectral clustering, stochastic block model
13. 4/25 Randomized numerical linear algebra, sketching for linear regression, ε-nets, Fast Johnson-Lindenstrauss Lemma
14. 5/2 Sparse recovery and compressed sensing
15. 5/9 Final Exam (2pm in regular classroom)