NYU CSGY 6763 (3943)
Algorithmic Machine Learning
and Data Science
Advanced theory course exploring contemporary computational methods that enable machine learning and data science at scale.
Course Team:
Lectures: Wednesday 11:00am1:30pm, Rogers Hall, Room 325. Recordings available through Brightspace.
Reading group: Calendar here. Sign up sheet here.
Prof. office hours (general): Mondays 11:00pm12:30pm, Zoom link.
Lucas office hours (general): Tuesdays 10:30am11:30pm, 8th Floor common area, 370 Jay
Atsushi office hours (general): Fridays 4:00pm5:00pm, Zoom link
Aarshvi office hours (undergrad only): Mondays 12:301:30pm, Zoom link.
Syllabus: here.
Gradescope Entry Code: RZ56X4
Link to Join Ed Discussion: https://edstem.org/us/join/eYUQ22
Grading breakdown: Quizzes 0%, Problem Sets 50%, Midterm 20%, Final Project OR Final Exam 20%, Partipation 10%
Final project guidelines: here.
Quizzes: Optional weekly checkin quizzes will be administered via GradeScope. They can be completed either after each lecture to check your understanding, or whenver you feel would be helpful (e.g. before an exam).
Problem Sets: Problem sets must be turned in via GradeScope. While not required, I encourage students to prepare problem sets in LaTeX or Markdown (with math support.) You can use this template for LaTeX. While there is a learning curve, these tools typically save students time in the end! If you do write problems by hand, scan and upload as a PDF. Collaboration is allowed on homework, but solutions and code must be written independently. Writing should not be done in parallel, and students must list collaborators for each problem separately.
Prerequisites: This course is mathematically rigorous, and is intended for graduate students and advanced undergraduates. Formally we require previous courses in machine learning, algorithms, and linear algebra. Experience with probability and random variables is necessary. See the syllabus for more details and email Prof. Musco if you have questions about your preparation for the course!
Resources: There is no textbook to purchase. Course material will consist of my written lecture notes, as well as assorted online resources, including papers, notes from other courses, and publicly available surveys. Please refer to the course webpage before and after lectures to keep uptodate as new resources are posted.
Reading Group: It's an exciting time for research at the intersection of algorithm design and the data sciences. Most of the topics covered in this course are still the subject of active research.
Starting midway through the semester we will be holding a reading group for students working on final projects (and any others who wish) to discuss and workshop papers.
Problem Sets:
Problem Set 1 (due Tuesday, Sept. 27th by 11:59pm ET).
Problem Set 2 (due Tuesday, Oct. 18th by 11:59pm ET).
Midterm Information (exam on Wednesday, Oct. 26th).
Problem Set 3, covid_data.csv (due Tues, Nov. 22th by 11:59pm ET).
Problem Set 4 (due Thursday, Dec. 15th by 11:59pm ET).
Final Exam Information (exam on Wednesday, Dec. 21st).
Week #  Topic  Reading  Homework  

The Power of Randomness  
1. 9/7  Random variables and concentration, Markov's inequality, applications 



2. 9/14  Chebyshev inequality, exponential tail bounds (Chernoff + Bernstein), and applications 


3. 9/21  Highdimensional geometry 


4. 9/28  JohnsonLindenstrauss Lemma and Dimensionality Reduction 



5. 10/5  Locality Sensitive Hashing, Introduction to Optimization 


Optimization  
6. 10/12  Gradient descent and projected gradient descent 


7. 10/19  Second order conditions, Online and stochastic gradient descent, preconditioning 


8. 10/26 
Midterm Exam (first half of class)
Differential Privacy 

9. 11/2  Constrained optimiziation, center of gravity method, linear programming, LP relaxation. 



Spectral Methods and Linear Algebra  
10. 11/9  Singular value decomposition, Krylov methods 


11. 11/16  Spectral graph theory, spectral clustering, generative models for networks 


11/23  NO CLASS  THANKSGIVING BREAK  
12. 11/30  Stochastic Block Model, Randomized numerical linear algebra, sketching for linear regression, εnets 


Fourier Methods  
13. 12/7  Fast JohnsonLindenstrauss Lemma, introduction to sparse recovery and compressed sensing 


14. 12/14  Restricted Isometry Property, finish up compressed sensing, introduction to importance sampling and leverage scores.  
15. 12/21  Final Exam (during regular class time) 