Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language ModelsAnmol Goel, Cornelius Emde, Sangdoo Yun, Seong Joon Oh, Martin Gubri
DATA-FM @ ICLR 2026
Anmol Goel, Cornelius Emde, Sangdoo Yun, Seong Joon Oh, Martin Gubri
DATA-FM @ ICLR 2026
We identify a novel phenomenon in language models: benign fine-tuning of frontier models can lead to privacy collapse. We find that diverse, subtle patterns in training data can degrade contextual privacy. Fine-tuned models lose their ability to reason about contextual privacy norms and violate memory boundaries across contexts while still scoring highly on common safety and utility benchmarks.
MASEval: A Framework-Agnostic Evaluation Library for Multi-Agent SystemsCornelius Emde, Alexander Rubinstein*, Anmol Goel*, Ahmed Heakl*, Sangdoo Yun, Seong Joon Oh, Martin Gubri
work in progress
Cornelius Emde, Alexander Rubinstein*, Anmol Goel*, Ahmed Heakl*, Sangdoo Yun, Seong Joon Oh, Martin Gubri
work in progress
Responsible Evaluation of AI for Mental HealthHiba Arnaout, Anmol Goel, H. Andrew Schwartz, Steffen T. Eberhardt, Dana Atzil-Slonim, Gavin Doherty, Brian Schwartz, Wolfgang Lutz, Tim Althoff, Munmun De Choudhury, Hamidreza Jamalabadi, Raj Sanjay Shah, Flor Miriam Plaza-del-Arco, Dirk Hovy, Maria Liakata, Iryna Gurevych
under review
Hiba Arnaout, Anmol Goel, H. Andrew Schwartz, Steffen T. Eberhardt, Dana Atzil-Slonim, Gavin Doherty, Brian Schwartz, Wolfgang Lutz, Tim Althoff, Munmun De Choudhury, Hamidreza Jamalabadi, Raj Sanjay Shah, Flor Miriam Plaza-del-Arco, Dirk Hovy, Maria Liakata, Iryna Gurevych
under review
Auditing Language Model Unlearning via Information DecompositionAnmol Goel, Alan Ritter, Iryna Gurevych
European Chapter of the Association for Computational Linguistics (EACL), 2026
Anmol Goel, Alan Ritter, Iryna Gurevych
European Chapter of the Association for Computational Linguistics (EACL), 2026
We expose a key limitation in current unlearning approaches: information about forgotten data can remain linearly decodable from internal representations. We propose an information-theoretic audit using Partial Information Decomposition and show that residual redundant information correlates with attack vulnerability.
Differentially Private Steering for Large Language Model AlignmentAnmol Goel, Yaxi Hu, Iryna Gurevych, Amartya Sanyal
International Conference on Learning Representations (ICLR), 2025
+ Theory and Practice of Differential Privacy (TPDP), 2025
Anmol Goel, Yaxi Hu, Iryna Gurevych, Amartya Sanyal
International Conference on Learning Representations (ICLR), 2025
+ Theory and Practice of Differential Privacy (TPDP), 2025
We propose Private Steering for LLM Alignment (PSA), a differentially private method for activation editing. PSA provides strong privacy guarantees while preserving alignment behavior, generation quality, and reasoning performance across multiple model families and sizes.
Socratic Reasoning Improves Positive Text RewritingAnmol Goel, Nico Daheim, Christian Montag, Iryna Gurevych
CLPsych Workshop, Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025
Anmol Goel, Nico Daheim, Christian Montag, Iryna Gurevych
CLPsych Workshop, Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025
We augment positive text rewriting with Socratic rationales built from question-answer chains. The resulting approach improves quality under automatic and human evaluation criteria motivated by psychotherapy practice.
From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed SentencesPrashant Kodali, Anmol Goel, Likhith Asapu, Vamshi Krishna Bonagiri, Anirudh Govil, Monojit Choudhury, Manish Shrivastava, Ponnurangam Kumaraguru
ACM Transactions on Asian and Low-Resource Language Information Processing (ACM TALLIP), 2025
Prashant Kodali, Anmol Goel, Likhith Asapu, Vamshi Krishna Bonagiri, Anirudh Govil, Monojit Choudhury, Manish Shrivastava, Ponnurangam Kumaraguru
ACM Transactions on Asian and Low-Resource Language Information Processing (ACM TALLIP), 2025
We construct Cline, a large dataset of human acceptability judgements for English-Hindi code-mixed text, and show that fine-tuned multilingual models outperform metric-based and baseline approaches in this setting.
An Unsupervised, Geometric and Syntax-aware Quantification of PolysemyAnmol Goel, Charu Sharma, Ponnurangam Kumaraguru
Empirical Methods in Natural Language Processing (EMNLP), 2022
Anmol Goel, Charu Sharma, Ponnurangam Kumaraguru
Empirical Methods in Natural Language Processing (EMNLP), 2022
We introduce an unsupervised framework for polysemy quantification using contextual representations and syntactic structure, yielding stronger correlation with expert lexical resources across multiple languages.
SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-MixingPrashant Kodali, Anmol Goel, Monojit Choudhury, Manish Shrivastava, Ponnurangam Kumaraguru
Findings of the Association for Computational Linguistics (ACL), 2022
Prashant Kodali, Anmol Goel, Monojit Choudhury, Manish Shrivastava, Ponnurangam Kumaraguru
Findings of the Association for Computational Linguistics (ACL), 2022
We introduce SyMCoM, a syntax-based measure of code-mixing variety, and demonstrate its utility for analyzing differences across English-Hindi datasets.
HLDC: Hindi Legal Documents CorpusArnav Kapoor, Mudit Dhawan, Anmol Goel, T.H. Arjun, Akshala Bhatnagar, Vibhu Agrawal, Amul Agrawal, Arnab Bhattacharya, Ponnurangam Kumaraguru, Ashutosh Modi
Findings of the Association for Computational Linguistics (ACL), 2022
Arnav Kapoor, Mudit Dhawan, Anmol Goel, T.H. Arjun, Akshala Bhatnagar, Vibhu Agrawal, Amul Agrawal, Arnab Bhattacharya, Ponnurangam Kumaraguru, Ashutosh Modi
Findings of the Association for Computational Linguistics (ACL), 2022
We introduce the Hindi Legal Documents Corpus (HLDC), a large corpus of Hindi legal text, and study bail prediction as a downstream use case for legal NLP in low-resource settings.