

You could do this with logprobs. The language model itself has basically no real insight into its confidence but there’s more that you can get out of the model besides just the text.
The problem is that those probabilities are really “how confident are you that this text should come next in this conversation” not “how confident are you that this text is true/accurate.” It’s a fundamental limitation at the moment I think.
Also “Thou mayest blame” and “Canst thou say”
Hurts my brain a little.