Large Language Models produce text responses to prompts, one token at a time. They have a "next token probability predictor", which predicts the probabilities of candidates for the next token, and a "sampler", which is a random number generator that chooses the next token from the candidates, weighted by their probability. It's amazing that this system can produce sensible text responses to prompts, but it often does. But sometimes, it gives clearly wrong answers, and these are known as Hallucinations. Many experts consider the Hallucination Problem as fundamentally unsolvable, since it is an inherent property of the Large Language Model architecture.

At Neocortix, we've invented a Hallucination Detector by extending the architecture of the Large Language Model, to give it a confidence measure for the next token chosen by the sampler. We have found that regions of low confidence are correlated with Hallucinations, so this confidence measure can be used to build a Hallucination Detector. In the demonstration above, you can see that the Large Language Model is hallucinating book titles that were never written by Lloyd Watts, and the Hallucination Detector is marking them successfully with light blue highlight, to indicate the hallucinating passages.

This Hallucination Detector by Neocortix is an essential technology for detecting, monitoring, and eliminating hallucinations in Large Language Models, and it can be further improved by combining it with the output of the Deep Attribution Network by Neocortix.