Per-Sample Invariant Tracking on MNIST with Certainty-Validity Diagnostics

License: CC BY 4.0 License DOI: 10.21203/rs.3.rs-9075794/v1 Keywords: Artificial Intelligence and Machine Learning, certainty-validity, epistemic diagnostics, per-sample invariant tracking, MNIST, uncertainty quantification, calibration, overcommitment, synthetic diagnostic environment

Abstract

Standard evaluation metrics conflate different kinds of error. A confident incorrect prediction and an uncertain incorrect prediction are both counted identically, even though they reflect different epistemic states. This paper uses the Certainty-Validity (CVS) framework to track how those states migrate during learning. The Minimal Operative Unit (MOU) ring serves as a synthetic diagnostic environment in which quadrant migration, lock thresholds, and intervention timing can be calibrated under controlled conditions before the same procedure is applied per-sample on MNIST. On the ring, baseline training drives CVS from 0.79 to 0.07 while accumulating confident-incorrect states, whereas CVS-regulated and CVS-gated regimes stabilize CVS in the 0.62-0.75 range and reduce confident-incorrect predictions by 11. On MNIST, freeze-based CVS interventions preserve aggregate CVS but also preserve still-learnable uncertainty, showing that aggregate control is too coarse on its own. Per-sample tracking localizes the residual instability to a small tail rather than broad model degradation: at confidence threshold theta = 0.7, MNIST exhibits no persistent uncertainty population, but it does exhibit a small persistent confident-error tail, a boundary-adjacent volatile set, and a small number of drift cases. In this setting, the main value of CVS is diagnostic: it identifies where commitment is stable, where it is forced, and where uncertainty remains unresolved during training.