Filip Sondej
2 papers
· 2025–2025
· 2 conferences
· across top CS/AI conferences
Achievements
🌍
Conference Polyglot
(2)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(15)
❓
The Questioner
Conferences
AAAI (1)
EMNLP (1)
Top co-authors
Keywords
direct preference optimization
(1)
neural network analysis
(1)
ai safety
(1)
mechanistic interpretability
(1)
safety fine-tuning
(1)
activation editing
(1)
defense mechanism
(1)
neuron analysis
(1)
toxicity reduction
(1)
language model safety
(1)
multi-agent system
(1)
agent compromise
(1)
malicious prompt
(1)
security trade-off
(1)
collaboration capability
(1)
malicious instruction
(1)