This ancient tribe was able to master the technique

Mickey7@lemmy.world · 7 days ago

This ancient tribe was able to master the technique

Flagstaff@programming.dev · 7 days ago

lol, does that trick still work as an AI censor bypass? #OutOfTheLoop

Excrubulent@slrpnk.net · 6 days ago

Yes, they try to prevent unwanted outputs with filters that prevent the LLM from seeing your input, not by teaching the LLM, because they can’t actually do that, it doesn’t truly learn.

Hypotheticals and such work because the LLM has no capacity to understand context. The idea that “A is inside B”, on a conceptual level, is lost on them. So the idea that a recipe for napalm is the same whether it’s framed within a hypothetical or not is impossible for them to decode. To an LLM, just wrapping the same idea in a new context makes it seem like a different thing.

They don’t have any capacity to self-censor, so telling them not to say something is like telling a human not to think of an elephant. It doesn’t work. You can tell a human not to speak about the elephant, because that’s guarded by our internal filter, but the LLM is more like our internal processes that operate before our filters go to work. There is no separation between “thought” and output (quotes around “thought” because they don’t actually think).

Solving this problem I think means making a conscious agent, which means the people who make these things are incentivised to work towards something that might become conscious. People are already working on something called agentic AI which has an internal self-censor, and to my thinking that’s one of the steps towards a conscious model.