Quantum version for the associative learning in large language models
In the TGD framework the model for associative learning, as it is modelled in large language models (LLMs) discussed from TGD point of view, could be generalized to formulate a quantum model for associative learning as it could occur in TGD inspired theory of consciousness.
I have discussed LLMs from TGD point of view already earlier (see this, this, and this). One could also consider the combination of the TGD inspired quantum version of associative learning with the speculative idea of extending a classical computer to a hybrid of classical and quantum computers (see this).
Zero energy ontology from the point of view of LLMs
Zero energy ontology (ZEO) is the first piece of the TGD vision.
- By holography, spacetime surfaces are analogous to Bohr orbits as basic objects. This means that 3-D structure as 3-surface determines almost deterministically the 4-surface.
The failure of a complete classical determinism is essential. The non-deterministic classical time evolution involves 3-D loci of non-determinism as analogs of 1-D frames of 2-D soap films.
Different Bohr orbits starting from a fixed 3-surface A at the passive boundary of CD would lead to different surfaces B located at the active boundary of CD whose size of CD would increase during the sequence of SSFRs.
The 3-D state at the passive boundary would remain invariant under the sequence of “small” state function reductions (SSFRs). This is the TGD counterpart of the Zeno effect.
The non-deterministic classical time evolution is modellable by a diffusion equation (diffusion) or Schrödinger type equation (dispersion). This process would be the quantum counterpart for the diffusion appearing in LLMs (see this). Whether this process could be seen as analog as a path integral as a sum over a discrete set of paths as Bohr orbits, is an interesting question.
Holography =holomorphy hypothesis and learning process
Holography=holomorphy hypothesis allows to reduce classical field equations to purely algebraic conditions (f1,f2)=(0,0), where fi are analytic functions of one hypercomplex and 3 complex coordinates of H=M4× CP2. The solutions are minimal surfaces irrespective of the classical action as long as it is general coordinate invariant and expressible in terms of induced geometry. This means universality of the dynamics and is quantum criticality expressed by the holomorphy. This implies saddle surface property for the spacetime surface meaning that the real parts of fi do not have minima or maxima in general.
Interestingly, the almost absence of minima meaning a saddlepoint property for most extrema is essential for the success of LLMs, which is in fact not well-understood. In LLMs, the cost function V measuring the size of the teaching error, is minimized in the parameter space by gradient dynamics. If most extrema are saddle points, the process does not get stuck to a local minimum and learning becomes very effective.
Furthermore, in LLMs local flatness of the parameter space is of help since it increases the probability that the gradient dynamics leads to the minimum and also reduces the probability to leave the minimum by a small perturbation.
Could the minimal surface property somehow prevent the sticking in the recent case?
- It is useful to consider the situation first at the level of a single space-time surface (rather than WCW). At the space-level all points are geometrically saddle points in the geometrical sense by the minimal surface property stating that the trace of the second fundamental form, as an analog of acceleration identifiable as a sum of external curvatures, vanishes. The absece of saddle points in this sense need not have anything to do with the absence of sticking in some dynamics.
- The quantum learning process would occur in the “world of classical worlds” (WCW) as the space of Bohr orbits rather than at the space-time level. The vacuum functional as an exponent of the classical action is proposed to have by the analog of Langlands duality (see this) also purely number theoretic expression, which would mean computability and enormous simplification (see this).
The Kähler function K, defining vacuum functional as its exponential, is in a central role. Also the degeneracies of the maxima are important. The maxima for the exponential of K”hahler function are thermodynamic analogs for Boltzmann exponents and their degeneracy measured by entropy. One can say that the minimization of energy and maximization of entropy compete.
Note that K is determined only modulo the addition of a real or imaginary part of a holomorphic function of WCW complex coordinates. The Kähler metric of WCW is in complex coordinates of the form
GMN*= ∂M∂N*K.
The maxima of vacuum functional exp(K), which correspond to minima of the Kähler function, are of special interest. The Euclidian signature puts strong constraints at the minima of K. Criticality condition means that some second partial derivates of K with respect to the real coordinates vanish.
A good example is the metric of complex plane given by ds2=dzdz* and has K=zz* having a minimum at origin. The metric is flat.
A model for the learning process
How could the learning process take place?
- Learning process can be seen mathematically as a construction of a representation for the dynamics of the external world by a subsystem. Associations A1→ B1 for the dynamics of the external world serve a teaching material and a representation as for these as associations A→ B in the internal model world is constructed as a model for the dynamics external world.
The external world states A1→ B1 would actually correspond to the sensory percepts of the states of the external world and in the learning process the system learns to predict the B1 as a consequence of A1.
In the TGD based model for sensory perception (see this and this) as construction of standardized mental images, the feedback loop between sensory organs and magnetic body would make this possible in the same way as in pattern recognition. The deviation of B from B1 is minimized. This deviation would define the virtual sensory input from the magnetic body to the sensory organ.
Classically A and B (A1 and B1) correspond to 3-surfaces at the boundaries of a CD and A (A1) is fixed in ZEO. At the quantum level, one has zero energy states as superpositions of orbits A→ B (A1→ B1).
The failure of non-determinism corresponds to the 3-D loci of non-determinism at the Bohr orbit of A and the discrete variables parametrizing the non-determinism correspond to the parameter space of LLMs.
The space of 3-surfaces at the passive or active boundary of CD would correspond to the latent space as a subspace of the space of features (see this). The cutoff to the degree of the polynomial and to the dimension of the Galois group of the polynomial would induce the analog of dimensional reduction replacing the feature space with a latent space. This cutoff would also reduce the parameter space as the discrete space characterizing the classical non-determinism.The TGD counterpart of the loss landscape (see this) corresponds to a subspace of the parameter space.
The construction of a representation means finding non-deterministic space-time surfaces A→B in CD producing an optimal representation for the pair A1→ B1, meaning that B is as near as possible B1. The error function measures the deviation of B from B1. In LLMs the error function is minimized by a gradient method. The counterpart ot his method in the the case of the construction of conscious association should be understood. The fact that the TGD Universe is fractal is expected to help considerably the construction of conscious associations as representations.
- The representation could be seen as a simplified version of the original obtained by scaling the size of the cd, either up or down.
- The reduction of the degree of polynomials used and the algebraic dimension of extension E reduce the complexity. The restriction of an extension of E to E reduces complexity and the hierarchies of extensions of E define complexity hierarchies.
- Also the hierarchies of analytic maps of (f1,f2)→ (g1(f1,22),g2(f1,f2)) define iteration hierarchies analogous to those associated with fractals and approach to what looks like chaos. One can also “imagine” more complex systems at the level of representation by extending E or performing these iterations.
A hybrid of classical and quantum computer and quantum model for associative learning
One can formulate this picture also in the speculative vision (see this) in which a classical computer becomes a living system as a hybrid of classical and quantum computers.
- A quantum computation-like process would be associated with classical computation. The classical non-determinism could be maximal in the sense that each tick of the computer clock would involve loci of classical non-determinism making the outputs of the gates non-deterministic.
Classical computation would correspond to the most probable Bohr orbit in the representation of the computation as a zero energy state. If localization in WCW is possible (position measurement in the discrete degrees of freedom of WCW due to non-determinism) this localization could occur at a single Bohr orbit.
For a summary of earlier postings see Latest progress in TGD.
For the lists of articles (most of them published in journals founded by Huping Hu) and books about TGD see this.
Source: https://matpitka.blogspot.com/2025/01/quantum-version-for-associative.html
Anyone can join.
Anyone can contribute.
Anyone can become informed about their world.
"United We Stand" Click Here To Create Your Personal Citizen Journalist Account Today, Be Sure To Invite Your Friends.
Before It’s News® is a community of individuals who report on what’s going on around them, from all around the world. Anyone can join. Anyone can contribute. Anyone can become informed about their world. "United We Stand" Click Here To Create Your Personal Citizen Journalist Account Today, Be Sure To Invite Your Friends.
LION'S MANE PRODUCT
Try Our Lion’s Mane WHOLE MIND Nootropic Blend 60 Capsules
Mushrooms are having a moment. One fabulous fungus in particular, lion’s mane, may help improve memory, depression and anxiety symptoms. They are also an excellent source of nutrients that show promise as a therapy for dementia, and other neurodegenerative diseases. If you’re living with anxiety or depression, you may be curious about all the therapy options out there — including the natural ones.Our Lion’s Mane WHOLE MIND Nootropic Blend has been formulated to utilize the potency of Lion’s mane but also include the benefits of four other Highly Beneficial Mushrooms. Synergistically, they work together to Build your health through improving cognitive function and immunity regardless of your age. Our Nootropic not only improves your Cognitive Function and Activates your Immune System, but it benefits growth of Essential Gut Flora, further enhancing your Vitality.
Our Formula includes: Lion’s Mane Mushrooms which Increase Brain Power through nerve growth, lessen anxiety, reduce depression, and improve concentration. Its an excellent adaptogen, promotes sleep and improves immunity. Shiitake Mushrooms which Fight cancer cells and infectious disease, boost the immune system, promotes brain function, and serves as a source of B vitamins. Maitake Mushrooms which regulate blood sugar levels of diabetics, reduce hypertension and boosts the immune system. Reishi Mushrooms which Fight inflammation, liver disease, fatigue, tumor growth and cancer. They Improve skin disorders and soothes digestive problems, stomach ulcers and leaky gut syndrome. Chaga Mushrooms which have anti-aging effects, boost immune function, improve stamina and athletic performance, even act as a natural aphrodisiac, fighting diabetes and improving liver function. Try Our Lion’s Mane WHOLE MIND Nootropic Blend 60 Capsules Today. Be 100% Satisfied or Receive a Full Money Back Guarantee. Order Yours Today by Following This Link.