I am an Assistant Professor of Computer Science and Engineering at New York University's Tandon School of Engineering. My students and I are part of the Algorithms and Foundations Group. In addition to Tandon's generous support, our research is supported by the National Science Foundation.
My work is on the computational foundations of machine learning and data science. I study efficient methods for processing and understanding large datasets. We often combine ideas from theoretical computer science with tools from computational and applied mathematics.
I was previously a Research Instructor at Princeton. I completed my PhD in computer science at MIT, where I was fortunate to be advised by Jonathan Kelner. Before MIT I was an engineer at Redfin and before that I studied Applied Math and Computer Science at Yale.
We hosted a minisymposium on "Randomized Functional Analysis" at SIAM's 2020 Mathematics of Data Science conference. Links to recorded talks can be found here. I also gave an introduction talk on the topic at DIMACS, which is available on YouTube.
Fall 2020, Fall 2019: NYU CS-GY 9223D, Algorithmic Machine Learning and Data Science
Spring 2020: NYU CS-UY 4563, Introduction to Machine Learning
Fall 2018: Princeton COS 521 , Advanced Algorithm Design
Spring 2016: MIT 6.854/18.415, Advanced Algorithms (teaching assistant)
Spring 2016: MIT 6.S977, Technical Communication for Graduate Students (workshop leader)
Other: I am a judge and problem writer for the SIAM/MathWorks M3 Challenge, a math modeling competition for high schoolers (which I participated in as a student). I am happy to chat with students, coaches, or other academics looking to get involved and inspire the next generation of applied mathematicians.
Aline Bessa, Postdoc, 2021 - present.
Raphael A. Meyer, Ph.D., 2019 - present.
Apoorv Singh, Ph.D. 2020 - present.
Prathamesh Dharangutte, M.S., 2019 - present.
Xinyu Luo, M.S., 2020 - present.
Indu Ramesh, M.S., 2020 - present.
Correlation Sketches for Approximate Join-Correlation Queries ACM Special Interest Group on Management of Data Conference. (SIGMOD 2021).
Public Transport Planning: When Transit Network Connectivity Meets Commuting Demand ACM Special Interest Group on Management of Data Conference. (SIGMOD 2021).
Dynamic Trace Estimation Preprint.
Graph Learning for Inverse Landscape Genetics AAAI Conference on Artificial Intelligence (AAAI 2021). Short version in NeurIPS 2020 AI for Earth Sciences workshop.
Simple Heuristics Yield Provable Algorithms for Masked Low Rank Approximation Innovations in Theoretical Computer Science (ITCS 2021).
The Statistical Cost of Robust Kernel Hyperparameter Turning Conference on Neural Information Processing Systems (NeurIPS 2020).Fourier Sparse Leverage Scores and Approximate Kernel Learning Conference on Neural Information Processing Systems (NeurIPS 2020). Invited for spotlight presentation.
Near Optimal Linear Algebra in the Online and Sliding Window Models IEEE Symposium on Foundations of Computer Science (FOCS 2020).
Low-Rank Toeplitz Matrix Estimation via Random Ultra-Sparse Rulers International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020).
Fast and Space Efficient Spectral Sparsification in Dynamic Streams ACM-SIAM Symposium on Discrete Algorithms (SODA 2020).
Analyzing the Impact of Filter Bubbles on Social Network Polarization ACM International Web Search and Data Mining Conference (WSDM 2020). Preliminary version in KDD 2019 workshop.
A Universal Sampling Method for Reconstructing Signals with Simple Fourier Transforms ACM Symposium on Theory of Computing (STOC 2019). Slides: 60 Minutes. Video: 30 Minutes.
Learning Networks from Random Walk-Based Node Similarities Conference on Neural Information Processing Systems (NeurIPS 2018). MATLAB Code.
Eigenvector Computation and Community Detection in Asynchronous Gossip Models International Colloquium on Automata, Languages, and Programming (ICALP 2018).
Minimizing Polarization and Disagreement in Social Networks The Web Conference (WWW 2018). MATLAB Code, requires CVX library for convex optimization.
Stability of the Lanczos Method for Matrix Function Approximation ACM-SIAM Symposium on Discrete Algorithms (SODA 2018). Slides: 60 Minutes. Sample code for the version of Lanczos analyzed: lanczos.m
Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees International Conference on Machine Learning (ICML 2017). Slides: 60 Minutes. Video: 80 minutes
Input Sparsity Time Low-Rank Approximation via Ridge Leverage Score Sampling ACM-SIAM Symposium on Discrete Algorithms (SODA 2017). Slides: 60 Minutes.
Determining Tournament Payout Structures for Daily Fantasy Sports SIAM Algorithm Engineering and Experiments (ALENEX 2017). Invited to special issue of ACM Journal of Experimental Algorithmics for ALENEX. Slides: 20 Minutes.
Principal Component Projection Without Principal Component Analysis International Conference on Machine Learning (ICML 2016). Slides: 15 Minutes. MATLAB Code for standard algorithm and faster Krylov method analyzed in our recent paper.
Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition Conference on Neural Information Processing Systems (NeurIPS 2015). Invited for full oral presentation. Slides: 20 Minutes. MATLAB Code.
Dimensionality Reduction for k-Means Clustering and Low-Rank Approximation
ACM Symposium on Theory of Computing (STOC 2015).
Slides: 60 Minutes.
We posted a note titled Projection-Cost-Preserving Sketches: Proof Strategies and Constructions which presents simplified proofs of the results in this paper and our 2017 work.
Uniform Sampling for Matrix Approximation Innovations in Theoretical Computer Science (ITCS 2015).
Single Pass Spectral Sparsification in Dynamic Streams IEEE Symposium on Foundations of Computer Science (FOCS 2014). Invited to special issue of SIAM Journal on Computing for FOCS, appeared 2017. Slides: 60 Minutes.
If you are interested in working with me on research, I would encourage you to apply to NYU Tandon's PhD program in Computer Science. I will be taking new students in 2021 (December 2020 application). You should have a strong mathematical background, including in probability theory and linear algebra.
If you are interested in algorithms, especially for data applications, New York City is a unique environment for pursuing a PhD. Beyond NYU's broad investment in data science, the city offers unmatched access to top academic institutions, industry research labs, and startups.
I am also happy to advise masters or undergraduate research at Princeton or NYU. Reach out to learn about existing projects or let me know what topics you are interested in exploring.
Prototype code for some of the algorithms I work on is available through my GitHub page. I am always happy to answer questions about these implementations or to put you in touch with others who have implemented the methods in different languages and environments.