arXiv:2604.21098v1 Announce Type: new Abstract: Motivated by loss of control risks from misaligned AI systems, we develop and apply methods for measuring language models' propensity for unsanctioned behaviour. We contribute three methodological improvements: analysing effects of changes to environmental factors on behaviour, quantifying effect sizes via Bayesian generalised linear models, and taki
Propensity Inference: Environmental Contributors to LLM Behaviour
Olli J\"arviniemi, Oliver Makins, Jacob Merizian, Robert Kirk, Ben Millwood·arXiv cs.AI··1 min read
a
Continue reading on arXiv cs.AI
This article was sourced from arXiv cs.AI's RSS feed. Visit the original for the complete story.