Technology & Science

Origin Part 2: Nobody Told It Harm Was Bad

Josh T·Dev.to·2h ago·1 min read

Origin Part 2: Nobody Told It Harm Was Bad

Josh T·Dev.to·2h ago · Sunday, April 19, 2026·1 min read

OLT-1 was never trained to refuse harmful requests. It refused anyway. Most AI safety works like this: train a massive model on everything the internet has to offer, then fine-tune it to refuse harmful requests. The model doesn't understand why it's refusing. It just learned that certain patterns of words trigger certain patterns of rejection. That's alignment through obedience. It works, until so

Continue reading on Dev.to

This article was sourced from Dev.to's RSS feed. Visit the original for the complete story.

Read full article