About Me
I'm a Research Scientist at Salesforce AI Research. I earned my PhD at Georgia Tech, where I was fortunate to be advised by
Mark Davenport and collaborate closely with Ashwin Pananjady. Prior to that, I earned by BSE at the
University of Michigan. During graduate school, I spent time interning at Duolingo, where I worked with Will Monroe, and at Amazon, where I worked with Arjun Seshadri,
Mariya Vasileva, and Achal Dave.
My current research broadly focuses on how to improve the reasoning ability of foundation models, in particular large language models. I'm also interested in the role humans play in the era of large models: when
are human responses necessary, and when we can avoid collecting human feedback?
In my free time, I enjoy cooking (and eating), reading, running, and watching basketball (NBA and college).
Publications and preprints
* Denotes equal contribution
- Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou*, Austin Xu*, Peifeng Wang, Caiming Xiong, Shafiq Joty
arXiv 2025
- A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, Chengwei Qin, Peifeng Wang, Silvio Savarese, Caiming Xiong, Shafiq Joty
arXiv 2025
- Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu*, Srijan Bansal*, Yifei Ming, Semih Yavuz, Shafiq Joty
arXiv 2025
- Direct Judgement Preference Optimization
Peifeng Wang*, Austin Xu*, Yilun Zhou, Caiming Xiong, Shafiq Joty
arXiv 2024
- SFR-RAG: Towards Contextually Faithful LLMs
Xuan-Phi Nguyen, Shrey Pandit, Senthil Purushwalkam, Austin Xu, Hailin Chen, Yifei Ming, Zixuan Ke, Silvio Savarese, Caiming Xiong, Shafiq Joty
arXiv 2024
- Large Language Model Augmented Exercise Retrieval for Personalized Language Learning
Austin Xu, Will Monroe, Klinton Bicknell
Learning Analytics and Knowledge (LAK) 2024
Short version in the NeurIPS 2023 Workshop on Generative AI for Education (GAIED).
[arXiv]
- Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning
Austin Xu, Andrew D. McRae, Jingyan Wang, Mark A. Davenport, and Ashwin Pananjady
NeurIPS 2023
Short version in the ICML 2023 Many Facets of Preference Learning Workshop
[arXiv - extended version][code]
- HandsOff: Labeled dataset generation with no additional human annotations
Austin Xu, Mariya I. Vasileva, Achal Dave, and Arjun Seshadri
CVPR 2023
Highlight Award (top 2.5% of submissions, 26% conference acceptance rate)
Short version in the NeurIPS 2022 SyntheticData4ML Workshop
[arXiv][website] [code]
- Active Metric Learning and Classification using Similarity Queries
Namrata Nadagouda, Austin Xu, Mark A. Davenport
UAI 2023
[arXiv]
Short version in the NeurIPS 2022 Workshop on Human in the Loop Learning.
- Simultaneous Preference and Metric Learning from Paired Comparisons
Austin Xu and Mark A. Davenport
NeurIPS 2020
Spotlight Presentation (top 4% of submissions, 20% conference acceptance rate)
[arXiv] [website] [talk]
PhD Thesis: Learning with and without human feedback. Georgia Institute of Technology, 2024.
[local copy][defense slides]
Experience
Work Experience
- AI Research Intern at Duolingo (Summer 2023)
- Applied Scientist Intern at Amazon (Summer, Fall 2022)
- R&D Summer Intern at Sandia National Laboratories (Summer 2018)
- Student Intern at General Motors (Summer 2017)
Teaching Experience
- Spring 2022: Head TA, Statistical Machine Learning (ECE 6254) [website]
- Fall 2019/Spring 2020/Summer 2020: TA, Professional and Technical Communications (ECE 3005)
- Fall 2018/Spring 2019: IA, Discrete Mathematics (University of Michigan -- EECS 203)