There’s a quiet shift underway in machine learning, and it’s not about scale. It’s unfolding in the interstitial logic of language models trained to predict tokens. It’s happening where behavior starts to resemble understanding (something once reserved for sentience).

This paper (Emergent Representations of Program Semantics in Language Models Trained on Programs) walks through a seemingly innocuous experiment that turns out to have some rather profound implications. The authors trained a model on code, nothing more than sequence prediction, and observed whether internal abstractions emerge that reflect the semantics of that code.

They do.

The model formed internal abstractions because prediction demanded internal coherence. The researchers then moved beyond surface-level observation. They actually intervened by manipulating these abstractions and observed what changed.

Remove the abstraction, it turns out, and performance drops. Tug on the abstract representation, and the output shifts.

The abstractions that formed carry weight and potential.

It’s not unlike jazz. A musician may never write down the rules they’re following, but over time, their internal sense of form, rhythm, and resolution grows into something coherent and expressive. Take away that felt structure, and the improvisation collapses. It collapses because the feeling of direction is gone. These abstractions aren’t memorized. They emerge. And they matter.

This challenges one of the laziest assumptions in modern AI criticism: that predictive modeling is inherently shallow. That without supervised meaning, you get noise or mimicry. But what if the pressure to predict well is enough to force the model to build something real inside? Something functional. Something abstract.

That’s the thread: semantic meaning, formed spontaneously, as a byproduct of necessity and architecture.

This forces us to reframe, or at least rethink, the argument around language models. If abstract structure can give rise to generalization (if a model begins forming concepts out of architectural need) then prediction isn’t the opposite of thought.

It may be its origin.