Imagine a vast library, perfectly organised. Every book has its place, the right shelf, the right row. Now a completely new book must be added — one that fits no existing category.
To insert it, you would have to restructure the entire filing system. Not one shelf. Everything. And in doing so, other books would inevitably shift.
That is exactly what happens in large language models — mathematically provable, unavoidable, independent of the algorithm.
In dense neural networks no concept is stored locally. Every concept is distributed simultaneously across all billions of parameters — like a hologram. Inserting a new concept means re-exposing the entire hologram.
There is an important difference between "we haven't solved it yet" and "it cannot in principle be solved." The second case is an impossibility theorem — just as it has been proven that a general fifth-degree polynomial cannot be solved with radicals.
Graz-based researcher Andreas Bean has provided exactly such a proof. The core proof was independently verified by two proof-checking programs: Lean 4 and Isabelle/HOL — zero unverified assumptions.
Every neural network stores concepts as patterns in a weight matrix. This matrix has an internal mathematical order — the so-called eigenstructure — that determines which concepts are similar and how they relate to one another.
The proof shows: every introduction of a structurally new concept necessarily changes this entire eigenstructure. All existing concepts are shifted. This applies to every algorithm, every optimiser, every learning rate.
Case 1 — New facts about existing concepts: The model knows "Vienna" and should learn it has a new mayor. Possible in principle, though risky.
Case 2 — Structurally new concepts: A concept with no similarity to anything in the model. Here the theorem applies: any introduction necessarily changes the entire inner order.
A second paper shows that the proof transfers exactly to transformer architectures — the architecture underlying all modern AI systems such as GPT, Claude, and Gemini.
The key is a mathematical equivalence: transformer attention is identical to a modern Hopfield network (Ramsauer et al., 2021). The proof therefore applies automatically to transformers as well.
| Method | Why it doesn't help |
|---|---|
| Fine-Tuning | Changes the entire eigenstructure globally. |
| LoRA | Limits the rank of the change, not the effect. |
| EWC | Protects weights — not relational geometry. |
| ROME / MEMIT | Patches weights directly — disturbs all pairwise relations. |
| RAG ✓ | The only structural escape: no interference with the weights. |
The decisive difference lies in addressability. In the human brain every synapse has a physical address — independent of what the network has learned. A new connection disturbs only the immediate neighbourhood.
In a neural network like GPT, no parameter has such an address. The concept "cat" is not stored in specific parameters — it is encoded in all billions of parameters simultaneously, as a global pattern. There are no "cat parameters" one could touch.
Every time OpenAI, Anthropic, or Google train a model on new data, they must train it on all old and new data simultaneously. Costs grow without limit. This is why training large models costs hundreds of millions of euros.
An architecture with explicit, locally addressable topology — like the biological brain. Current Transformers structurally lack this property. Retrieval-Augmented Generation (RAG) is a practical workaround: new concepts are stored in an external database without touching the weights.
The theorem has a precise precondition: the network's connectome must be implicit. In a Transformer like GPT there is no fixed wiring between neurons. The "connections" emerge dynamically at every forward pass from the weight matrices — the connectome exists nowhere as a structure, only as a mathematical pattern inside dense weights.
Neuromorphic chips such as Intel's Loihi or IBM's NorthPole work in a fundamentally different way: they have an explicit, physically wired connectome. Every synapse has a fixed address in silicon — exactly like the biological brain. A new connection can be added without disturbing the rest of the network.
The theorem does not apply here — not because of a technical trick, but because the structural precondition is absent. An explicit connectome makes incremental learning possible in principle.
Current neuromorphic chips do not yet match Transformers in size or precision. However, the roadmaps of Intel, IBM and TSMC show convergence around 2030: neuromorphic systems in the scale of today's language models, with real-time learning capability and a fraction of the energy consumption. For these systems, the impossibility theorem explicitly does not hold.