It’s hard to talk to someone in “Generative AI” these days and not get asked about what my latest memory architecture is. Do you use LangChain? What about embeddings? What does long term memory look like?
We collectively seem to have this strange fascination with eidetic memory as humans, probably because it is quite elusive to us. Facts like where I last put my keys, who I spoke with, or even what I ate earlier in the day quickly fade from our working memory. And perhaps because we can’t quite remember what they were, we incorrectly assume they would have been valuable to us.
Computers historically have been the exact opposite - eidetic memory without any meaningful abstract processing. Given the last 60 years where we’re so dedicated and focused on creating machines that remember everything for us, and offload our memory even further, it’s not surprising that everyone I speak with is trying to force eidetic memory into language model applications! Anything else would be a dramatic disruption to the traditional computing paradigm, even if it would be perfectly reasonable to someone who had never met a computer.
The primary reason we think differently is almost everything we believe about computers as processing machines - the metaphors and interaction paradigms - can be traced back to a single demo of a single person. The “Mother of All Demos” by Engelbart in 1968… Let me say that again, 1968, demonstrated windows, hypertext, graphics, the mouse navigation and command input, video conferencing, word processing, collaborative real-time editing, revision control, and dynamic file linking. Beyond that single demo, Englebart had this vision of, well, basically corporate ChatGPT that took in all your data and output insights. So, basically everything we’ve seen in computers up until now has relied heavily on (1) memory and (2) a single man’s vision from almost 100 years ago.
Today, inside corporations, we’re obsessed with “knowledge wikis” and in hobbyist communities seeking to continue the Englebartian legacy it’s “knowledge graphs” - the spartan evolution of a personal wiki. We create knowledge gardens that need tending for some reason, but never check if there’s a harvest.
Go to conferences around the future of computing and you’ll see people showing off how much “stuff they know” in their Roam knowledge graphs.
However, I’m here to tell you that memory (while important) is not really the thing to focus on if you’re interested in transforming machines.
Here’s an extreme example - my cat walks around my house and as far as I can tell she can only remember a smaller number of discrete things in the world than I could fit in an 8k GPT context window. The set of facts she knows where here food bowl is, she knows what time of day she’s going to be fed, she knows who I am, she knows that my baby smells funny and she should probably stay away. And that’s about it.
And yet, my cat has a much more rich connection to me than any machine! She has an emotional processing center that understands the emotional context of our relationship and how to contribute. I’d much rather spend time with my cat than a computer.
Notably, my cat has 1000 fewer neurons than GPT has parameters and I still find her a better life partner than a machine.
But you might object, that line of argumentation relies on purely emotional grounds. Well, even for logically processing, similar lines of reasoning hold.
Efficient forgetting is a human superpower - as we sleep our brains are constantly pruning memories to leave room for processing more important things. And … But you probably won’t believe some truisms from psychology, so here’s a personal story, and at a bare minimum, what I’m about to share is an existence proof that eidetic memory is irrelevant, and probably harmful for information processing.
Here’s a photo from my computational reasoning system during my PhD studying theoretical quantum physics:
When I began my PhD, I started using lab notebooks. But I would often get stuck "processing" inside the notebook and have to go up to a whiteboard or chalkboard to write anything meaningful down. What I eventually discovered is that the specific property of the notebook that is harmful is simply the visual presence of extraneously information inside the notebook - this extra information is highly problematic for abstract reasoning. If the problem you’ve intended to compute is wildly complicated, bringing extra context into a metabolically unfavorable1 reasoning process destroys your ability to think.
Instead, the beauty of a yellow notebook is that every time the context needs to reset (like creating a new thread in ChatGPT), you just flip the page and are left with a blank slate.
Now, what you gain in ability to process efficiently, you lose in memory. Here, the yellow notebook might be the single worst invention other than loose leaf sheets of paper for memory. I can tell you definitively that after flipping back maybe 10 pages and noticing that what you’re looking for isn’t there, you give up. Especially when you have a stack of notebooks that look like this.
And yet I still managed to write over 30 peer reviewed papers, each containing wholly new knowledge during my PhD, so I will tell you with 100% certainty the secret isn’t memory. Also, to be clear, many PhDs are considered successful with a single paper or two, and often convert into faculty positions at that rate. So if I feel qualified to tell you one thing, it is that storing all your “knowledge in a graph” is not what’s holding you back.
So what is going on then?
The process under which humans acquire information has actually been thoroughly studied by cyberneticists like Gordon Pask, and others from a lineage of epistemology and semiotics, for example, who deeply came to model what happens inside a conversation2.
One of the key insights here is that humans are abstraction learners, who accumulate associative graphs of abstractions. What we actually commit to memory are efficient representations of new concepts, whose instantiations are dependent upon our prior known representations.
Applying this to my PhD work, what matters in knowledge creation is not the prior knowledge, but what exists on the edge and the frontier, and the representations of those concepts. Creation of new knowledge is play around the frontier of yet to be connected concepts.
Then what of forgetting?
During the last months of my PhD, I had been working on solving a singular problem off and on for approximately ten years. At least once a month, I would attempt to write down an analytical solution to this problem3 and fail, until one day I didn't.
I can’t really tell you what changed between the time I wrote down the solution correctly, to the time I didn’t. All I can tell you is that it didn’t work, until one day it did. Now, I can articulate many of the specific abstract concepts like quantum stochastic calculus which I had accumulated and acquired, and later discovered were dependencies to the new knowledge I had created, but I don’t know why I solved it the day that I did. And I never will.
Ultimately, being able to solve seemingly unsolvable problems requires forgetting. You have to forget why you failed.
It was forgetting why I had failed the day before, and in exactly what way I had failed, that meant I could approach the problem again each day with fresh eyes. I didn’t forget the abstractions and insights that led to its solution, but without forgetting all of the extraneously details I never could have solved an unsolvable problem.
This experience serves as a powerful inspiration to my perception of how machines should work - they probably don’t need or shouldn’t have, even, eidetic memory. And so, if we want to transform our relationship with machines, we need to look away from memory and embrace the power of efficient forgetting.
Our logical reasoning systems are wildly metabolically unfavorable, hence why we use heuristics for everything
Originally machine learning was into this stuff as a field, but that interest came too early. Our next set of computational insights are going to come from a blast from the past, now that we have LLMs that can function as arbitrary abstraction transformers up to a certain level of complexity
If you’re interested the precise problem was an analytical form for stimulated emission - side note it turns of everyone misunderstood this process going back to Einstein and I wasn’t crazy after all
Interesting. But if the creation of new knowledge depends on finding new connections between existing information, then we need to be more specific about what should and should not be forgotten. When you solved the problem, you did forget so much everything you know about calculus as much as some of the connections between the pieces you were playing with. In this sense, it seems we should be pruning or “forgetting” some of the edges in the knowledge graph but not so much nodes. Some amount of eidetic memory still seems like a pre-requisite for such knowledge production. This seems to be related to dropout in neural networks or why meditation and journaling are helpful for problem solving. RAM needs to be clear for efficient processing, but long term memory should still be available. Without eidetic memory, there is nothing to build on.
This is interesting. Sometimes, I have cursed myself for the lack of eidetic memory - but I can see how your final point is truly probably why I was able to succeed after failing so long when confronting a hard problem.
While it might then be ideal to let them forget, I wonder if something akin to journaling and potentially later “stumbling upon and reflecting on” might not be a good intermediate solution.