Cybernetic Interoception and the 83% Ambiguity Ceiling
Note: This post expands on the philosophical and cybernetic framing around the CSST control-signal line. It builds on the earlier CyVy diagnostic framework, but it is not the same paper.
The Biological Interoceptive Analogy
Organisms do not wait for autopsy reports to regulate metabolism; they use interoceptive signals—gut feelings, proprioception, fatigue—as real-time feedback that modulates behavior.
In standard machine learning, metrics like accuracy or AUROC are autopsies. We train a model to convergence (or collapse), evaluate it on a holdout set, and then try to diagnose what went wrong.
With the earlier CyVy diagnostic framework and the later CSST control-signal line, we asked a different question: what if a learning system had interoception? What if it possessed a live control signal—a measure of “how well am I knowing?”—that gated how aggressively the system commits to a belief?
The UI -> CI Pathology
When you train a discrete commitment architecture (like Eidos) on ambiguous data (like the overlapping classes in Fashion-MNIST or mixed-sentiment IMDB reviews), the system naturally hits a ceiling—often around 83%.
But the architecture isn’t failing. It is appropriately refusing to commit when the structural evidence ends. This is the Uncertain-Incorrect (UI) state. It is the epistemic equivalent of saying, “I don’t know.”
The pathology arises when we force the system to keep learning past this point using standard, unregulated training loops. The system undergoes a phase transition where it migrates from Uncertain-Incorrect (UI) to Confident-Incorrect (CI). It stops being appropriately doubtful and starts hallucinating structure where none exists.
Unvalidated Extrapolations: Implications for LLMs
While our empirical validations in the CSST paper focus on the MOU ring diagnostic and the Eidos CNN on MNIST/CIFAR, the implications for autoregressive Large Language Models are profound.
LLMs currently possess no interoceptive signal. They produce confident-sounding outputs on nearly every prompt, with no internal mechanism to detect when they are operating beyond their epistemic competence.
The CyVy diagnostic and CSST results together suggest that “hallucination” in LLMs may not be a bug of the transformer architecture itself. Rather, it might be an inevitable consequence of training a system past its own Control-Signal Stabilization Threshold (CSST) without any epistemic feedback loop. It is the exact same UI $\to$ CI migration we observed in our focused experiments, operating at a massive scale.
If a similar interoceptive loop could be implemented at inference time for an LLM—where the model monitors its own certainty-validity estimate and withholds commitment when that estimate degrades—it could constitute a structural, cybernetic solution to the hallucination problem, completely bypassing the need for post-hoc RLHF filtering.
The Paradigm Shift
We must move away from evaluating reasoning systems purely on aggregate scalar accuracy. A model that achieves 83% accuracy and knows exactly why it is uncertain about the remaining 17% is infinitely more valuable—and safer—than a model that achieves 85% by confidently hallucinating its way through edge cases.
True intelligence requires epistemic humility. It’s time our architectures reflected that.