le_throosh@lemmy.dbzer0.com to Fuck AI@lemmy.world · 28 days agobrothelemmy.clubimagemessage-square6linkfedilinkarrow-up1425arrow-down17file-textcross-posted to: comicstrips@lemmy.world
arrow-up1418arrow-down1imagebrothelemmy.cluble_throosh@lemmy.dbzer0.com to Fuck AI@lemmy.world · 28 days agomessage-square6linkfedilinkfile-textcross-posted to: comicstrips@lemmy.world
minus-squarepkjqpg1h@lemmy.ziplinkfedilinkEnglisharrow-up10·28 days agoAccording to the AA-Omniscience benchmark The most expensive models, Opus 4.6 has a 60% hallucination rate and 46% accuracy rate. Gemini 3.1 Pro Preview has a 50% hallucination rate and 55% accuracy rate. And the questions aren’t even open-ended. I don’t even need to tell you about the other models.
minus-squareKairos@lemmy.todaylinkfedilinkarrow-up4·edit-227 days ago“Opus 4.6” like every other LLM has a 100% hallucination rate because that’s the literal only thing they do.
According to the AA-Omniscience benchmark
The most expensive models,
Opus 4.6 has a 60% hallucination rate and 46% accuracy rate. Gemini 3.1 Pro Preview has a 50% hallucination rate and 55% accuracy rate.
And the questions aren’t even open-ended.
I don’t even need to tell you about the other models.
“Opus 4.6” like every other LLM has a 100% hallucination rate because that’s the literal only thing they do.