Hi! I'm a machine learning researcher interested in probabilistic methods.
I am currently working on code generation. I've previously dabbled in methods for efficient human-robot interaction. In particular, methods from Bayesian optimization, experimental design, and program synthesis for optimal and trustworthy communication.
A bit about me: I am currently a hacker at Cohere. I obtained my PhD from Cornell, advised by Sasha Rush. I started grad school at Harvard, also with Sasha. Before that, I was a research engineer at Facebook AI Research. And before all that, I scraped by as an undergrad at UPenn CIS.
Research Topics
Software Engineering Agents
How can we make robots better software engineers? As they become better engineers, do the principles that guide human-oriented design still hold for robots?
How can we make robots better software engineers? As they become better engineers, do the principles that guide human-oriented design still hold for robots?
Interaction as optimal control
In human-robot interaction, many tasks are too complex to accomplish in a single turn. How can robots collaborate with humans by resolving ambiguity as efficiently as possible? We frame interaction as an optimal control problem, and explore simple heuristics.
In human-robot interaction, many tasks are too complex to accomplish in a single turn. How can robots collaborate with humans by resolving ambiguity as efficiently as possible? We frame interaction as an optimal control problem, and explore simple heuristics.
- Symbolic Planning and Code Generation for Grounded Dialogue (EMNLP 2023)
- Asking More Informative Questions for Grounded Retrieval (ACL Findings 2024)
Scaling discrete latent variable models
Discrete structure is common in the world (language, biology, code), and can also yield efficient or interpretable models. However, discrete structure makes learning difficult due to non-differentiability. Can we scale models with discrete structure? And what structural properties can we take advantage of?
Discrete structure is common in the world (language, biology, code), and can also yield efficient or interpretable models. However, discrete structure makes learning difficult due to non-differentiability. Can we scale models with discrete structure? And what structural properties can we take advantage of?
- Simple and Effective Masked Diffusion Language Models (NeurIPS 2024)
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models (ICLR 2025)
- Scaling Hidden Markov Language Models (EMNLP 2020)
- Low-Rank Constraints for Fast Inference in Structured Models (NeurIPS 2021)
- HOP, UNION, GENERATE: Unsupervised Multi-hop Reasoning (EMNLP 2023)